International Journal of Computational Vision and Robotics (55 papers in press)
Image enhancement based on skin-colour segmentation and smoothness
by Haitao Sang, Bo Chen, Shifeng Chen, Li Yan
Abstract: The image restoration tasks represented by image denoising, super-resolution and image deblurring have a wide range of application background, and have become a research hotspot in academia and business circles. A novel image enhancement algorithm based on skin texture preserving is proposed in this paper. The mask has been obtained using the Gaussian fitting, which can have a box blur for many times for skin feather. The denoising smoothing image is fused with the original image mask to preserve the hair details of the original image and enhance the edge details of the contour, so as to provide more effective information for the extraction of edge features. Compared with different methods of image smoothing algorithms, this
algorithm is more effective in smoothing the skin edge contour and achieving better detection of images. Experimental results show that the proposed algorithm has strong adaptive capacity and significant effect on most images detection. Specifically, it can moderately smooth the edges of the areas with many details, leaving no traces of an artificial process. The proposed algorithm with image enhancement has a wide range of practicality.
Keywords: image enhancement; image restoration; image generation and synthesis; texture preserving smoother; skin-colour model.
Supervised learning software model for the diagnosis of diabetic retinopathy
by M. Padmapriya, S. Pasupathy
Abstract: Diabetic retinopathy (DR) is the leading cause of eye diseases and vision loss for diabetic affected people. Due to the damage of retinal blood vessels, diabetic patients often suffer from DR. So the retinal blood vessel segmentation plays a crucial role in the diagnosis of DR. We can prevent vision loss or blindness problems if the diagnosis happens during the early stages. Early diagnosis and initial investigation would help lower the risk of vision loss by 50%. This article exploits the supervised classification approach to detect blood vessels by applying features such as grey level and invariant moments. The image pre-processing and blood vessel segmentation are the two essential steps are used in this study, along with the proposed classification framework using neural network models. Two publicly available retinal image datasets, such as DRIVE and STARE, are used to assess the proposed supervised classification framework. The suggested supervised classification methodology in this study attains the average retinal blood vessel segmentation accuracy of 93.94% in the DRIVE dataset and 95.00% in the STARE dataset.
Keywords: diabetic retinopathy; fundus imaging; grey level features; invariant
moments; vessel segmentation.
Facial expression recognition based on convolutional block attention module and multi-feature fusion
by Man Jiang, Shoulin Yin
Abstract: In this paper, we focus on the research of facial expression recognition. A novel convolutional block attention module and multi-feature fusion method are proposed for facial expression recognition. The local feature clustering loss function is proposed, which can reduce the difference between the same class of images and enlarge the difference between different classes of images in the training process. The convolutional block attention module is adopted to better express facial expressions in local areas with rich expressions. Experimental results show that the proposed method can effectively recognise different expressions on the RAF dataset and CK+ dataset compared with other state-of-the-art methods.
Keywords: facial expression recognition; convolutional block attention module; CBAM; multi-feature fusion; local feature clustering; LFC.
Obstacle detection technique to solve poor texture appearance of the obstacle by categorising image's region using cues from expansion of feature points for small UAV
by Muhammad Faiz Ramli, Syariful Syafiq Shamsudin
Abstract: Achieving a reliable obstacle detection system for small unmanned aerial vehicle (UAV) is very challenging due to its size and weight constraints. Prior works tend to employ the vision sensor as main detection sensor but resulting to high dependency on texture appearance while not having distance sensing capabilities. Besides, most of wide spectrum range sensors are heavy and expensive. The contribution of this work is on different based-sensor integration technique to increase reliability of detection. Secondly, developed method to create trusted avoidance path by categorising the region in environment into two regions, which are the obstacle region and free region. Cues from expansion of the features points are used to extract the depth information of the environment and classify the region in the image frame. The results show that the proposed system able to handle multiple obstacle and create safe path regardless of the texture and size of the obstacle.
Keywords: obstacle detection; feature points; region classification; safe avoidance path; vision-based-sensor; range-based-sensor; speeded up robust features; SURF; convex hull; depth perception.
Bilateral filter-oriented multi-scale CNN fusion model for single image dehazing
by Jiangjiang Li, Jianjun Zhu, Huili Chen
Abstract: This paper proposes a bilateral filter-oriented multi-scale CNN fusion model for single image dehazing. A multi-scale CNN model with low frequency and high frequency dehazing sub-network is designed. First, the haze image is decomposed by bilateral filter. The low and high frequency of haze image are obtained. Second, the map relationship between the high/low frequency and the high/low frequency transmittance is researched by the designed network model. Third, the high and low frequency transmittance obtained from the model is fused to obtain the scene transmittance map corresponding to the original haze image. Finally, according to the atmospheric scattering model, the haze image is restored to the clear image without haze, and the haze image data set is used to train and test the model. The experiment results show that the proposed method can achieve better dehazing effect in both subjective and objective evaluation.
Keywords: single image dehazing; bilateral filter; multi-scale CNN fusion; map relationship.
A modified Coye algorithm for retinal vessel segmentation
by Sakambhari Mahapatra, Uma Ranjan Jena, Sonali Dash, S. Agrawal
Abstract: Eyes are the best predictors of numerous disorders including glaucoma, diabetic retinopathy, hypertension, and stroke, according to a scientific study. An ophthalmologist can learn about the problems by looking at the segmented retinal blood vessel network. The goal of this study is to offer ophthalmologists with reliable segmented retinal blood vessels to help them pinpoint the issue. This work put forwards an automated method of vessel extraction by incorporating curvelet-based enhancement with the Coye algorithm. Further, the segmentation performance is fine-tuned by embodying a pair of complementary gamma functions (PCGF) for contrast improvement. The suggested approach is evaluated on DRIVE and STARE databases and shows outstanding results as compared to state-of-the-art algorithms.
Keywords: curvelet transform; Coye algorithm; gamma transform; pair of complementary gamma function; PCGF; vessel segmentation.
DDVM: dual decision voting mechanism for brain tumour identification with LBP2Q-SVM type classifier
by Mansi Lather, Parvinder Singh
Abstract: Brain tumour classification plays a significant role in medical science as diagnosis of a brain tumour at its early stage of development can improve the recovery of the patient after treatment. In this paper, effective brain tumour presence and type classification methods are proposed. A pre-processing phase of the proposed model is capable to handle the dull medical images by contrast enhancement and noise filtering. In the first phase, to detect the tumour a dual decision voting mechanism (DDVM) for convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM) classification models is proposed. The final tumour identification is done by score maximisation. In the second phase, to identify the type of tumour as high-grade glioma or low-grade glioma, a novel algorithm named LBP2Q featured support vector machine classification model is designed. The results of both phases demonstrated that the proposed scheme outperforms the existing techniques in terms of various performance matrices.
Keywords: biomedical image processing; brain tumour detection; classification model; machine learning; medical image analysis.
Automated inspection of spur gears using machine vision approach
by Ketaki Joshi, Bhushan Patil
Abstract: The paper presents a machine-vision-based system for automated inspection of standard spur gears. Image processing algorithms are used for the measurement of important gear dimensions such as radii of addendum circle, dedendum circle and pitch circle, module, number of teeth, pressure angle, tooth thickness, circular pitch, radial runout and tooth alignment error. Deviations from theoretical values according to gear standards are computed and a decision is made regarding acceptance/rejection. The performance of machine vision inspection system is evaluated in terms of its accuracy and precision. Accuracy is based on deviation of machine vision values from those obtained using traditional metrology instruments and gear standards. Precision is measured using partial gauge R&R study. The results obtained for gear images taken by different operators using different imaging devices are repeatable, reproducible and in good agreement with the true values. The results indicate that the machine vision approach is accurate and precise.
Keywords: machine vision; inspection; spur gear; image processing; accuracy; precision; gauge R&R.
The recognition of 3-phase power quality events using optimal feature selection and random forest classifier
by Laxmipriya Samal, Hemanta Kumar Palo, Badri Narayan Sahu
Abstract: This article proposes a novel feature vector by combining the K-means Apriori feature selection algorithm (KAFS) and statistical technique to classify 3-phase Power Quality Disturbance (PQD) events. While the K-means algorithm has clustered the raw signals, the Apriori algorithm has been capable to fetch the desired discriminative features of the chosen PQD events. Further, these discriminative features extracted have been utilized to compute nine-statistical parameters. The reliability of the novel feature vector has been measured in classifying the 3-phase PQD events with similar statistical parameters obtained from the raw PQD samples. Finally, the ability of the Short-time Fourier Transform (STFT) as a time-frequency tool has been evaluated using the KAFS algorithms for the said task. The Random Forest (RF) classifier is chosen to validate the efficacy of the proposed feature vectors. The novel optimized feature vectors using the KAFS have indeed enhanced the recognition accuracy as revealed from our results.
Keywords: power quality; feature selection; classification; recognition accuracy; random forest algorithm.
IoT-based real time clinical healthcare system for aging and underprivileged areas
by Muhammad Shakir, Shahid Karim, Muhammad Imran Saeed, Halar Mustafa, Shahzor Memon, Syed Abbad Kazmi
Abstract: In past decayed with ages, numerous lives are in danger frequently as the patients are not timely and properly operated. Furthermore, constant parameter cannot be accurately estimated at home such as in the hospitals. In underprivileged areas of Pakistan, the caretakers do not have much instinct and right devices to follow the patients situation. This device will come in handy since it is all in one portable, small in size, and easy operable which will be very helpful for guardians. The specialists/caretaker can see real time digital outcomes on an android application which are converted for stability into analogue waveforms and it will also be sent to Google Firebase cloud database which can be accessed worldwide. The framework will likewise produce an alert warning when the outcome goes beyond normal. Our framework is valuable for checking the wellbeing arrangement of each individual through effectively appending the gadget and recording it.
Keywords: internet of things; monitoring system; mobile application; sensors; cloud database.
Application of digit and speech recognition in food delivery robot
by Low Chun Yin, Sarah Atifah Saruchi, Ong Hong Tze, Chew Ying Xin, Chong Han Wei, Jonathan Lam Lit Seng
Abstract: In COVID-19 quarantine centres, physical human interaction is limited to prevent the spread of the virus. Food delivery robots have been seen replacing humans to perform the task perfectly. However, there is a limit in the tasks that a single robot can handle. This paper designs an efficient and intelligent food delivery robot that acts as a messenger that recognises speech from patients and humans in the background can act on them without any physical interaction. The workload on the microcontroller is greatly reduced when a task like face recognition is replaced with digit recognition as patients are tagged with numbers. The design of the robot is also modular and scalable for bigger centres, introducing the capability to expand when necessary. The future of robotic delivery relies on the efficiency and scalability of multiple systems.
Keywords: speech recognition; digit recognition; image processing; computer vision; robotics.
A systematic study of intelligent autism spectrum disorder detector
by Indu Jamwal, Deepti Malhotra, Mehak Mengi
Abstract: Autism spectrum disorder also known as ASD is a complex developmental condition particularly related to the nervous system that affects peoples communication, social behaviour, and underlying social knowledge. The problem of autism is not common to a particular age group but it has been ascending rapidly among all age groups. Earlier prediction of this developmental disorder will grandly help in sustentation of the subjects physical as well as mental soundness. With more advancement in technology, early detection of certain neurological disorders now becomes reality. Mostly machine-learning methods are used for the analysis of ASD. This research paper presents the systematic review of existing AI models for ASD detection based on screening methods, eye movements, and MRI data, and based on limitations of existing studies, the authors have proposed an ASD_esfMRI for earlier detection of autism which can be implemented in future by using eye gaze data and MRI data collectively.
Keywords: autism spectrum disorder; ASD; machine learning; magnetic resonance imaging; MRI; structural MRI; functional MRI; neurological; detection; prediction.
An improved sclera recognition using kernel entropy component analysis method
by B.S. Harish, M.S. Maheshan, C.K. Roopa, R. Kasthuri Rangan
Abstract: Among the various biometric traits that exist in the human body, sclera is considered to be prominent because of its unique characteristics. In this paper, we propose an improved sclera recognition method using kernel entropy component analysis (KECA). The main objective of this paper is to integrate kernel-based methods with entropy to choose the best principal components. Further, the resulting top principal components are given a symbolic interval valued representation. To evaluate the efficiency of the new proposed representation method, we conducted extensive experimentation using various classifiers. The proposed method has achieved over 5.09% of hike in the accuracy result with 50:50 split and over 10.69% of hike with 60:40 split, respectively. The obtained result of the proposed method is effective and feasible for sclera recognition.
Keywords: sclera; recognition; kernel entropy; symbolic representation.
Local directional double ternary coding pattern for facial expression recognition
by Chebah Ouafa, Laskri Mohamed Tayeb
Abstract: This paper presents a novel texture descriptor, the local directional double ternary coding pattern (LDDTCP) that combines the directional information from LDP and the ternary description from LTP for representing facial expression. The proposed LDDTCP operator encodes the image texture by computing the edge and line responses values using the eight directions based Frei-Chen masks. To achieve robustness, the obtained eight Frei-Chen masks are partitioned into two groups according to their directions. After calculating the average of each group, we assign three discrimination levels to each pixel based on the edge responses values in the first group and the line response values in the second group, we obtain LDDTCP-1 and LDDTCP-2 codes, respectively. The last feature descriptor vector LDDTCP is formed by concatenation both LDDTCP-1 and LDDTCP-2 histograms. Experimental results using the CK and JAFFE database show that the LDDTCP descriptor achieves superior recognition performance compared to some existing local descriptor methods.
Keywords: facial expression recognition; human face; appearance descriptor; geometry descriptor; local binary pattern; LBP; local directional pattern; LDP; local ternary pattern; LTP; support vector machine; SVM.
Versatile formation patterns for cooperative target tracking using ground vehicles
by Lili Ma
Abstract: In this paper, we investigate the cooperative target tracking problem using a group of autonomous mobile robots. By introducing a tracking control component to existing pursuit-based formation schemes, it is possible to achieve simultaneous tracking and formation in versatile concentric formations. Balanced circular formations can now be achieved with a prescribed formation radius. Elliptical formations with a variety of orientations and shapes can be achieved by applying a transformation matrix. To address the practical issue of obstacle avoidance, a repellent vector field technique is used, which prevents agents from approaching obstacles. Tracking, formation, and avoidance are combined to provide a more comprehensive solution for cooperative target tracking. The models considered include both single-integrator and double-integrator robots. MATLAB simulations are used to demonstrate the effectiveness of the proposed schemes.
Keywords: Cooperative target tracking; balanced circular formation; prescribed formation radius; elliptical formation; obstacle avoidance.
Generalised homomorphic and root filtering in 2D-nonseparable discrete linear canonical transform domains in the image enhancement applications
by Shobha Sharma, Tarun Varma
Abstract: In this paper, the generalised homomorphic filtering (HF) and root filtering (RF) techniques are extended to 2D-nonseparable discrete linear canonical transform (2D-NsDLCT) domains in the low light image enhancement applications. The objective is to improve the visual appearance for the benefit of further processing. The input image is first transformed into 2D-NsDLCT domains in the proposed methodology, and then HF or RF is applied to it. The filtered image is inverse transformed to the spatial domain. The advantage of the proposed technique is based on the fact that the 2D-NsDLCT domains provide many free parameters that can be varied to improve the visual quality of the given images. We have compared the simulation results of the proposed methods with the special cases of 2D-NsDLCT and state-of-the-art methods. The computed quality metrics reveal that the output images of the proposed methods have better quality than the competing techniques.
Keywords: NsLCT; homomorphic filtering; root filtering; image enhancement.
Real-time sign language recognition and speech conversion using VGG16
by Dona Mary Cherian, Jincy J. Fernandez
Abstract: Sign language is used to communicate non-verbally by the deaf and mute community. This method consists of hand gestures or sign for representing the language. Hand gesture recognition extends human-computer interaction (HCI) more convenient and flexible to society. Therefore, it is important to classify each character correctly without error. In this time, online interpreters are available for translating the sign language or gestures to corresponding common language and vice versa. But it requires an expert or intermediate who can translate in both ways. Sensors are also used with hand gloves for tracking hand articulates. Thus, the communication for the deaf/dumb community and the rest has become difficult and costly. This paper mainly describes the classification of sign language hand gestures to its corresponding alphabets in text form using deep neural networks. After classification the text is converted to speech which helps the visually challenged people to understand the sign. The method will classify real-time images captured using a desktop camera. The accuracy of the model obtained using convolution neural network was 97%.
Keywords: American sign language; ASL; convolutional neural network; CNN; visual geometry group 16; VGG16.
Adaptive kernel-based active contour
by Gunjan Naik, Shubhangi Kelkar, Bhushan Garware, Aditya Abhayankar
Abstract: Geodesic active contour model (GACM) is a standard deterministic method for the segmentation of complex organ structures based on edge maps. For MRI images, the GACM performs poorly due to noise and weak edges, which might result from a low scanning period, low Tesla scanner machines, and other environmental conditions. The performance of GACM is getting affected due to constant edge detector kernels and based only on intensity values. To improve this performance, we have proposed a method involving adaptive kernels and phase-based edge detection called 'phase congruency'. The kernels used in phase congruency are log Gabor kernels for the calculation of edges. Instead of log Gabor kernels, we have proposed to use ICA kernels, which resemble similar anisotropic properties like log Gabor kernels and are also adaptive. This adaptive kernel-based phase congruency provides a robust edge map, to be used in GACM. Experimentation shows that when compared with state of art edge detection techniques, adaptive kernels enhance the weak as well as strong edges and improve the overall performance.
Keywords: active contour model; image segmentation; phase congruency; edge detection; geodesic active contour model; GACM.
A comparative study between convolution neural networks and multi-layer perceptron networks for hand-written digits recognition
by Aaron Rasheed Rababaah
Abstract: This paper presents an investigation that aims at comparing deep learning (DL) and traditional artificial neural networks (ANNs) in the application of hand-written digits recognition (HDR). In our study, convolution neural networks (CNNs) are a representative model for the DL models and the multi-layer perceptron (MLP) is a representative model for ANN models. The Two models of MLP and CNN were implemented using MATLAB development environment and tested using a publically available image database that consists of over 20,000 samples from all ten hand-written digits each of which is 24 x 24 pixels. The experimental results showed that the CNN model was superior to the MLP model with an average classification accuracy of 95.14% and 89.74% respectively. Furthermore, the CNN model was observed to have better performance stability and better execution efficiency as the MLP model requires human intervention to handcraft and pre-process the features of the digit patterns.
Keywords: hand-written digit; pattern recognition; multi-layer perceptron; MLP; deep learning; convolution neural networks; CNNs; comparative study.
Paddy variety identification from field crop images using deep learning techniques
by Naveen N. Malvade, Rajesh Yakkundimath, Girish B. Saunshi, Mahantesh C. Elemmi
Abstract: On-field identification of paddy varieties provides actionable information to farmers and policymakers in many aspects of crop handling and management practices. In this paper, three transfer learning pre-trained models namely ResNet-50, EfficientNet-B7, and CapsNet are presented to effectively classify the field crop images of 15 different paddy varieties captured during the booting plant growth stage. The experiments using the CapsNet model with an image dataset comprising 60,000 labelled images show the significant performance with the testing accuracy of 92.96%, and validation accuracy of 95%. The ResNet-50 and EfficientNet-B7 models have yielded the average validation accuracies of 85% and 90%, respectively. The CapsNet model has achieved both higher accuracy and better computational efficiency over the considered deep learning classification models on the held out paddy field crop image dataset.
Keywords: paddy variety identification; field crop image classification; deep convolutional neural networks; DCNN; transfer learning; CapsNet; ResNet-50; EfficientNet-B7.
Camouflaged object segmentation using saliency maps - a comparative study
by Sachi Choudhary, Rashmi Sharma
Abstract: Camouflage is the most common approach employed by armed forces to conceal something from the enemy's gaze on the battlefield or elsewhere. This article covers the literature on several strategies used to find concealed objects that have features in common with the surrounding environment in terms of colour, texture, orientation, and intensity levels. The concern of this research is the use of saliency map to locate the camouflaged object in the scene. The proposed methodology generates a saliency map based on region contrast. Another application for detecting the hidden object in the scene is to evaluate the ability of the blending camouflage pattern. Therefore, computations have been performed to locate the hidden object within the surrounding environment and to find the effectiveness of a camouflaged texture. A comparative study has been conducted here that compare the performance of saliency map based on centre surrounded, global contrast and proposed region contrast. The focus area for this comparison is on camouflaged object only. Based on precision, recall and F-measure values, the performance of mentioned approaches have been evaluated.
Keywords: camouflage object detection; saliency map; camouflage texture evaluation; military camouflage.
An experimental evaluation of feature detectors and descriptors for visual SLAM
by Taihú Pire, Hernán Gonzalez, Emiliano Santano, Lucas Terissi, Javier Civera
Abstract: Visual SLAM has, in general, a high computational footprint. Its potential applications such as augmented reality (AR), virtual reality (VR) and robotics have hard real-time constraints and limited computational resources. Reducing the cost of visual SLAM systems is hence essential to equip small robots and AR/VR devices with such technology. Feature extraction, description and matching is at the core of feature-based SLAM systems, having a direct impact in their performance. This work presents a thorough experimental analysis of feature detectors, descriptors and matchers for visual SLAM, focusing on their cost and their effect in the estimation accuracy. We also run our visual SLAM system in an embedded platform (Odroid-XU4) and show the effect of using such limited hardware in the accuracy and cost of the system. Finally, in order to facilitate future research, our evaluation pipeline is made publicly available.
Keywords: visual SLAM; local image feature; descriptor extractor; keypoint detector; performance evaluation.
Path planning of mobile manipulator for navigation and object clean-up
by Aaditya Asit Saraiya, Sangram Keshari Das, B.K. Raut, V. Kalaichelvi
Abstract: Industry and warehouses have been paying lots of attention to mobile manipulator-based path planner problems. This paper focuses on multi-target object clean-up operations using vision sensor which has ample industrial applications. In this work a vision-based path planning approach has been implemented using A* algorithm in order to avoid the obstacles and reach the goal location using the shortest path. The algorithm was developed to classify objects in the workspace as handleable/non-handleable from real-time measurements. In case of multi-object clean-up operations, a priority is set depending on the scenario and a weighted cost function approach is proposed. A series of simulation experiments are conducted to test the effectiveness of the proposed algorithm. The entire workflow of the mobile manipulation-based path planner is demonstrated using various scenarios. This problem has lot of relevance in real world.
Keywords: vision-based navigation; mobile manipulation-based path planner; object detection; A* path planning algorithm; OpenCV; ROS framework.
Research on the online parameter identification method of train driving dynamic model
by Dandan Liu, Xiangxian Chen, Zhonghao Guo, Jiaxi Yuan, Shoulin Yin
Abstract: Automatic train operation (ATO) system is an important driving control system for train operation, which adjusts traction or braking force in real time according to different operating environments. As an important part of the ATO system, the train dynamic model determines the tracking accuracy of the train to the target speed. Based on the force analysis of the actual train operation, the single-particle dynamic models of train operation were established. Considering the high efficiency of the single-particle model in online identification, the single-particle train model is applied to the actual parameter identification. Firstly, the second-order single particle model is established, and three identification methods and two sets of data are compared and analysed. The auxiliary model and the recursive least square method with variable forgetting factor (AM-VFF-RLS) identification method have good performance. On this basis, a third-order-single-particle model is established. Through the analysis of the identification results, it is found that the model can improve the identification accuracy while ensuring the efficiency.
Keywords: train dynamic model; online identification; AM-VFF-RLS; ATO system.
An optimised local feature compression using statistical and structural approach for face recognition
by A. Divya, K.B. Raja, K.R. Venugopal
Abstract: Face recognition is the current extensive research region studied among several recognition tasks in the field of pattern recognition. Face images captured under an unrestricted environment generally contain discrepancies in the pose, illumination and expression (PIE). To improve the robustness of the face image due to PIE variations, an optimised local feature compression (OLFC) is proposed using the matching algorithm and classifier. The pixel values of the images are structured as low picture element values (LPEV) and high picture element values (HPEV). The discrete wavelet transform and statistical methods are applied on LPEV and HPEV respectively to obtain substantial data and statistical features, which results in reduced features dimensions. Experiment is performed on six popular face databases (ORL, YALE, JAFFE, EYB, Faces-94 and FERET), illustrates an excellent performance with high recognition accuracy of 95.5%, 99.33%, 100%, 99.69%, 99.86% and 96.39% respectively with reduced error and computation time compared with existing methods.
Keywords: face recognition; discrete wavelet transform; DWT; Euclidean distance; artificial neural networks; ANNs.
Energy-aware automatic video annotation tool for autonomous vehicle
by N.S. Manikandan, K. Ganesan
Abstract: In a self-driving car, real-time video obtained from the camera sensors is analysed using various scene understanding algorithmic modules (object detection, object classification, lane detection and object tracking). In this paper, we propose an annotation tool that uses deep learning techniques for each of the four modules mentioned above, and the best ones are chosen based on suitable metrics. Our tool is 83% accurate when compared with a human annotator. We considered a video with 530 frames of resolution 1,035 x 1,800 pixels. Our proposed tool consumed 43 minutes of computation with 36.73 g of CO2 emission in a CPU-based system and 2.58 minutes of computation with 7.75 g of CO2 emission in a GPU-based system to process all four modules. But the same video took nearly 3,060 minutes of computational usage with 2.56 kg of CO2 emission for one human annotator to narrate the scene using a normal computer.
Keywords: automatic annotation; deep learning; object classification; object detection; lane detection; object tracking.
Image retrieval by using texture and shape correlated hand crafted features
by Suresh Kumar Kanaparthi, U.S.N. Raju
Abstract: Content-based image retrieval (CBIR) has become one of the trending areas of research in computer vision. In this paper, consonance on hue, saturation, and intensity is used by applying inter-channel voting between them. Diagonally symmetric pattern (DSP) from the intensity component of the image is computed. The grey level co-occurrence matrix (GLCM) is applied to DSP to extract texture features. Histogram of oriented gradients (HOG) features is used to extract the shape information. All three features are concatenated. To evaluate the efficiency of our method, five performance measures are calculated, i.e., average precision rate (APR), average recall rate (ARR), F-measure, average normalised modified retrieval rank (ANMRR) and total minimum retrieval epoch (TMRE). Corel-1K, Corel-5K, Corel-10K, VisTex, STex, and colour Brodatz are used. The experimental results show an improvement in 100% cases for Corel-1K dataset, 80% cases for Corel-5k and 80% cases for each of the three texture datasets.
Keywords: content-based image retrieval; CBIR; interchannel voting; texture; hand crafted features; shape.
An automated system to detect crop diseases using deep learning
by Purushottam Sharma, Manoj Kumar, Richa Sharma, Shashi Bhushan, Sunil Gupta
Abstract: Food is one of the necessities for a human being to survive. Moreover, since the population is increasing with each passing day, growing sufficient crops to feed such a vast population becomes evident. Also, the countrys economy is based on agricultural production as well. However, there is a significant threat to agricultural crop production in todays times, and hence the analysis of crop diseases becomes essential. Thus, the automatic identification and analysis of plant diseases are highly desired in agricultural information. The main objective of the research to develop an optimised approach for system automation to detect crop diseases. Here we proposed an approach for building an automated system that primarily detects diseases using leaf images and some other features like recommending the remedy for that disease. We created a model using a convolution neural network algorithm and used the transfer learning approach using Inception v3 and ResNet 50 model. Further, we used this model and collected some data for remedies for the diseased classes and added that feature to our system.
Keywords: convolutional neural network; CNN; leaf image; transfer learning; crop disease; Inception v3; ResNet 50.
An improved multi-criteria-based feature selection approach for detection of coronary artery disease in machine learning paradigm
by Bikesh Kumar Singh, Sonali Dutta, Poonam Chand, Khilesh Kumar, Sumit Kumar Banchhor
Abstract: This paper presents an accurate approach for the detection of coronary artery disease (CAD) using an improved multi-criteria feature selection (IMCFS) approach in a machine learning (ML)-based paradigm. This study uses the Z-Alizadeh Sani dataset of CAD, consisting of 303 patients with 56 different attributes. The proposed IMCFS-based approach uses seven different traditional feature selection techniques. For classification, the support vector machine is used with four different kernel functions and is evaluated using three cross-validation protocols. Lastly, performance is evaluated using five measures. The proposed IMCFS-based approach using the 30 most relevant features outperforms all other traditional feature selection techniques and achieved the highest classification accuracy, sensitivity, specificity, the area under receiver operating characteristics, and Mathews correlation coefficient of 91.9%, 95.7%, 82.1%, 88.9% and 79.7%, respectively. The proposed IMCFS-based approach is an entirely reliable, automated, and highly accurate ML tool for detecting CAD.
Keywords: coronary artery disease; CAD; multi-criteria feature selection; machine learning; classification; support vector machine; SVM; kernel functions; cross-validation; accurate; automated; reliable.
A combination of 'feature mapping' and 'block' approaches to reduce the matching area of stereoscopic algorithms
by Djaber Rouabhia, Nour Eddine Djedi
Abstract: In this paper, we propose a new approach to restrict the matching field of stereoscopic algorithms. It has been found that computing the disparity map implies using the whole image for a wide range of stereoscopic methods, thus, leading to extra-time calculation and visual artefacts in the results. Based on this observation, we derived an approach that significantly reduces the evaluated time of stereoscopic algorithms and avoids noises appearing in the result. The proposed approach introduces a strong association between silhouette edges and stereoscopic algorithms by using only the geometric information present in the images to restrict the matching area. The proposed method aims to limit the matching zone to the exact geometry of the analysed object, avoiding, therefore, extra times and undesirable noises. We did not use hard-coding algorithms or expensive equipment, and we got accepted results in terms of time and accuracy.
Keywords: 3D reconstruction; stereo-vision; multi-view stereo; MVS; disparity map; feature mapping; block.
Non-intersecting curved paths using Bezier curves in the 2D-Euclidean plane for multiple autonomous robots
by Utsa Roy, Krishnendu Saha, Chintan Kr Mandal
Abstract: This paper proposes an algorithm for generating non-intersecting continuous paths for multiple robots having unique source and destination with convex polygonal obstacles in the 2D-Euclidean plane. It generates paths using Dijkstra's algorithm with the edges of the visibility graph of the map. Although Dijkstra's algorithm is used, the generated paths between the source-destinations will not be explicitly 'short paths' as an edge of the visibility graph cannot be used in multiple paths. The robots are prioritised with respect to the Euclidean distance between them. The discrete paths are converted into continuous paths using Bezier curves along the corners of the edges. The algorithm sequentially generate paths one after another based on a priority based on the Euclidean distance between their source and destination.
Keywords: visibility graph; Dijkstras algorithm; convex hull; Bezier curve.
Energy-aware vehicle/pedestrian detection and close movement alert at nighttime in dense slow traffic on Indian urban roads using a depth camera
by N.S. Manikandan, K. Ganesan
Abstract: In recent times, Infrared cameras, thermal cameras, RADAR, LIDAR, and depth cameras are widely used in the nighttime vehicle/pedestrian detection systems. In the present research, we propose a novel way of detecting the vehicles/pedestrians in dense, slow traffic conditions at night times. To train and build the necessary artificial intelligence model, daytime depth images are used. For training, the necessary datasets are created using a customised method. The trained model was used to detect the vehicles/pedestrians during night time. In addition, the tracking algorithm was used to follow the vehicle/pedestrian and predict its close movement and direction and provide the necessary warnings to the vehicle drivers. The proposed model was tested against a variety of object detection and tracking techniques involving embedded GPUs and depth cameras. The best suits of algorithms were identified based on the metrics such as accuracy, execution time, and the less carbon emission. Our proposed method has detected and tracked the vehicles/pedestrians accurately within the 17-metre range.
Keywords: green computing; artificial intelligence; deep learning; object detection; object tracking.
Developed a late fusion of multi facial components for facial recognition with a voting method and global weights
by Nguyen Van Danh, Vo Hoang Trong, Pham The Bao
Abstract: With the development of deep learning, many solutions have achieved outstanding performance in solving facial recognition problems. Nevertheless, many challenges still stand, such as occluded face or illumination. This paper proposes a late fusion of many weighted weak classifiers to form a strong classifier for facial recognition. We train convolutional neural network models as weak classifiers on specific facial components. We build a strong classifier by lately fusing those weak classifiers with corresponding weights calculated locally or globally. A voting method is applied to determine the identity of the face. We experimented on five databases: ORL, CyberSoft, Georgia Tech, Essex Grimace and Essex Faces96. Performances of our method in those databases varied between 99% and 100%. Our proposed method can be used efficiently when a facial image only contains a few facial components. Also, our proposed global weights worked well on many facial databases.
Keywords: facial recognition; facial components; multi-CNNs; late fusion; voting method.
RGB-depth map formation from cili-padi plant imaging using stereo vision camera
by Wira Hidayat Bin Mohd Saad, Muhammad Haziq Bin Abd Razak, Muhammad Noorazlan Shah Zainudin, Syafeeza Binti Ahmad Radzi, Muhd. Shah Jehan Bin Abd. Razak
Abstract: Stereo vision is one of the advancements in computer vision and pattern recognition applications using a dual camera to mimic human visuals. This study focused on RGB-depth (RGB-d) map image formation selection parameters, specifically from the stereo image captured on the cili-padi (birds-eye chilli) plant. The process starts from calibrating the camera used with a checkerboard image to obtain the cameras intrinsic and extrinsic resolution. The stereo images were rectified to facilitate the disparity computation between the left and right images. Then, point cloud plotting is acquired by using a triangulation function on the image disparity with the camera parameter value. RGB-d images are computed by normalising the depth information of each point plot into greyscale value or any other suitable colourmap. Comparing the different types of disparity map transformation function algorithms used to produce the RGB-d image shows that using SGM-function provides the best output of RGB-d image formation.
Keywords: cili-padi plant; depth map formation; RGB-depth map; stereo camera vision.
An investigation into automated age estimation using sclera images: a novel modality
by Sumanta Das, Ishita De Ghosh, Abir Chattopadhyay
Abstract: Automated age estimation attracts attention due to its potential application in fields like customer relationship management, surveillance, and security. Ageing has a significant effect on human eye, particularly in the sclera region, but age estimation from sclera images is a less explored topic. This work presents a comprehensive investigation on automated human age estimation from sclera images. We employ light-weight deep learning models to identify the changes in the sclera colour and texture. Extensive experiments are conducted for three related tasks: estimation of exact-age of a subject, categorical classification of subjects in different age-groups, and binary classification of adult and minor subjects. Results demonstrate good performance of the proposed models against the state-of-the-art methods. We have obtained mean-absolute-error of 0.05 for the first task, accuracy of 0.92 for the second task, and accuracy of 0.89 for the third task.
Keywords: human age estimation; age-group classification; adult-minor binary classification; sclera images; deep learning; MASDUM; SBVPI.
ResNet-based surface normal estimator with multilevel fusion approach with adaptive median filter region growth algorithm for road scene segmentation
by Yachao Zhang, Yuxia Yuan
Abstract: As an integral part of information processing, road information has important application value in map drawing, post-disaster rescue and military application. In this paper, convolutional neural network is used to fuse lidar point cloud and image data to achieve road segmentation in traffic scenes. We first use adaptive median filter region growth algorithm for preprocessing the input image. The semantic segmentation convolutional neural network with encoding and decoding structure of ResNet is used as the basic network to cross and fuse the point cloud surface normal features and RGB image features at different levels. After fusion, the data is restored into the decoder. Finally, the detection result is obtained by activation function. The KITTI data set is used for evaluation. Experimental results show that the proposed fusion scheme has the best segmentation performance. Compared with other road detection methods, the results show that the proposed method can achieve better overall performance. In terms of AP, the value of proposed method exceeds 95% for UM, UMM scene.
Keywords: road segmentation; adaptive median filter region growth; data fusion; point cloud surface normal feature; encoding and decoding structure.
Texture-based approach to classification meningioma using pathology images
by Yasmeen O. Sayaheen
Abstract: Manual analysis and judgement system suffered by two boundaries: first, studying histological slides by manual humans effort is time overhead and the human specialists are not permanently obtainable. Secondly, while a lot of work has been done to outline diagnostic standards for all tumour components. CAD is quickly developing owing to the obtainability of up-to-date computing procedures, fresh imaging tools, plus patient data for infection diagnosis. Decision making using computer-assisted can be helped to enhance histopathologists by providing additional objective diagnostic and analytic parameters. Recently, tumour has become one of the most affected diseases that affect human health. Brain is a central system for human bodies that control, organise and arrange regular habit tasks. This paper talks about meningiomas tumour, which is considered one of the popular brain tumours. Colour-based segmentation, morphological operation used to enhance the appearance of cells. Texture-based feature FOS, GLCM, GLRS, GLDS and NGTDM) used to enhance CAD feature extraction process and two classifiers used to improve decision making (SVM and KNN).
Keywords: meningiomas tumour; texture feature; first order statistics; FOS; grey-level co-occurrence matrix; GLCM; grey-level run length statistic; GLRS; GLDS; neighbourhood grey-tone difference matrix; NGTDM; classification; support vector machine; SVM; k-nearest neighbours; KNN.
Identification of personality traits from handwritten text documents using multi-label classification models
by Salankara Mukherjee, Ishita De Ghosh
Abstract: Handwriting is widely investigated to mark emotional states and personality. However, the majority of the studies are based on graphology, and do not utilise personality factor models. We use the well-known five-factor model which says that people possess five basic traits, together known as big-five. Hence the problem of personality prediction from handwriting is essentially a multi-label problem. In addition to that, the predicted values should be non-binary decimal numbers since the model says people possess the traits in various degrees. Multi-label classifiers are not explored yet for personality assessment using handwriting features. Current work aims to bridge the gap. Multi-label classifiers are trained by trait scores obtained by big-five inventory as well as handwriting features. A number of classifiers including classifier chain, binary relevance and label power-set are employed in the work. Best accuracies of 95.9% with non-binary label values and 97.9% with binary label values are achieved.
Keywords: multi-label classification; personality assessment; big-five traits; handwriting features; non-binary label values.
Comparison of convolutional neural networks architectures for mango leaf classification
by B. Jayanthi, Lakshmi Sutha Kumar
Abstract: Plant diseases are a threat to the food supply as they reduce the yield, and reduce the quality of fruits and grains. Hence, early identification and classification of plant diseases are essential. This paper aims to classify mango plant leaves into healthy and diseased using convolutional neural networks (CNNs). The performance comparison of CNN architectures, AlexNet, VGG-16 and ResNet-50 for mango plant disease classification is provided. These models are trained using the Mendeley dataset, validation accuracies are found and compared with and without the use of transfer learning models. AlexNet (25 layers, 6.2 million parameters) produces a testing accuracy of 94.54% and consumes less training time. ResNet-50 (117 layers, 23 million parameters) and VGG-16 (16 layers, 138 million parameters) have given testing accuracies of 98.56% and 98.26% respectively. Therefore, based on the accuracies achieved and complexity, this paper recommends AlexNet followed by ResNet-50 and VGG-16 for plant leaf disease classification.
Keywords: convolution neural networks; neural network; image classification; precision agriculture.
Edge feature enhanced convolutional neural networks for face recognition using IoT devices
by Ankur, Mohit Kumar Rohilla, Rahul Gupta
Abstract: COVID-19 pandemic has turned the world upside down, with almost everything coming to a halt. In the current period, where we are slowly returning to normal lives, organisations have become more concerned about safety and health. In the post-COVID period, biometric systems based on Fingerprint can be dangerous; moreover, real-time attendance of employees and students joining from online mode is a challenge. Real-time face recognition is a challenging task in terms of accuracy and reliability, especially when deep convolutional neural networks (DCNN) are used for face recognition. DCNNs are data-hungry, and in real-life scenarios, the amount of data per subject or class is minimal, and the number of subjects/classes can be huge. Hence, the need for research on image processing and data augmentation research arises for face recognition as there are many scenarios where the number of classes (subjects) is vast.
Keywords: face recognition; edge enhancement; face edge processing; deep convolutional neural network; DCNN; data augmentation; image processing.
Unsupervised image transformation for long wave infrared and visual image matching using two channel convolutional autoencoder network
by Kavitha Kuppala, Sandhya Banda, S. Sagar Imambi
Abstract: Pixel level matching of multi-spectral images is an important precursor to a wide range of applications. An efficient feature representation which can address the inherent dissimilar characteristics of acquisition by the respective sensors is essential for finding similarity between visual and thermal image regions. Lack of sufficient benchmark datasets of corresponding visual and LWIR images hinders the training of supervised learning approaches, such as CNN. To address both the issues of nonlinear variations and unavailability of huge data, we propose a novel two channel non-weight sharing convolutional autoencoder architecture, which computes similarity using encodings of the image regions. One channel is used to generate an efficient representation of the visible image patch, whereas the second channel is used to transform an infrared patch to a corresponding visual region using encoded representation. Results are shown by computing patch similarity using representations generated from various encoder architectures, evaluated on two datasets.
Keywords: convolutional autoencoder; CAE; multi-spectral image matching; transformation network; two channel siamese architecture; structual similarity measure; SSIM; KAIST dataset; mean squared error; MSE; peak signal to noise ratio; PSNR; Earth mover’s distance; EMD.
Intelligent classification model for holy Quran recitation Maqams
by Aaron Rasheed Rababaah
Abstract: Quranic recitation is a field that has been studied for centuries by scholars from different disciplines including tajweed scholars, musicians and historians. Maqams are a system of scales of melodic vocal patterns that have been established and practiced by Quran reciters all over the world for centuries. Traditionally, Maqams are taught by an expert of Quran recitation. We are proposing a process model for intelligent classification of Quran maqams using a comparative study of neural networks, deep learning and clustering techniques. We utilised a publicly available audio dataset of Maqams labelled audio signals consisting of the eight primary Maqams: Ajam, Bayat, Hijaz, Kurd, Nahawand, Rast, Saba, and Seka. The experimental work showed that all of the three classifiers nearest neighbour, multi-layered perceptron and deep learning performed well. Furthermore, it was found that deep learning with power spectrum features was the best model with a classification accuracy of 96.55%.
Keywords: Quran Maqams; neural networks; signal processing; deep learning; convolutional neural networks; CNN; audio signal features; short-term Fourier transform; STFT; power spectrum.
Quantitative analysis of transfer and incremental learning for image classification
by Mohammed Ehsan Ur Rahman, Imran Shafiq Ahmad
Abstract: Incremental and transfer learning are becoming increasingly popular and important because of its advantageous nature in data scarcity scenarios. This work entails a quantitative analysis of the incremental learning approach along with various transfer learning methods using the task of image classification. A detailed analysis of the assumptions under which incremental learning should be applied is presented. The degree to which these assumptions hold in most real-world scenarios is also presented. For experiments, MNIST and CIFAR-100 were used. The extensive coverage of incremental and transfer learning techniques on these two datasets showed that a performance improvement is achieved when these techniques are used in data-scarce situations.
Keywords: transfer learning; incremental learning; deep learning; image classification; image generation; neural networks; artificial intelligence; machine learning; MNIST; CIFAR-10; digit recognition.
An improvement in IoT-based smart trash management system using Raspberry Pi
by Muhammad Shakir, Shahid Karim, Shahzor Memon, Sadiq Ur Rehman, Halar Mustafa
Abstract: Our primary aim is to establish an environmentally sustainable and pollution-free community. The responsible states and their citizens carry out all the attempts to make the city neat and clean. In this paper, smart dustbin garbage collection work has been proposed and completed. To avoid all garbage issues, we have developed a project based on a monitoring system with the help of IoT technology. The proposed work is bidirectional; firstly, it connects with hardware, and secondly, it is supported with mobile by developing an Android-based application. Firebase fire store is used to provide communication between both applications. We have improved the smart trash management system using Raspberry Pi, which is pertinent to developed cities worldwide. This project tracks the dustbins and informs the admin of the amount of garbage collected in the garbage bins via a smartphone application. The proposed approach lowers the total number of waste collection truck trips, thereby lowering the total waste collection budget. It ultimately contributes to societys cleanliness.
Keywords: internet of things; IoT; Raspberry Pi; ultrasonic sensor; garbage monitoring; Android.
A framework for breast cancer prediction and classification using deep learning
by Praveen Kumar Shukla, Aditya Ranjan Behera
Abstract: Breast cancer is a very common disease nowadays. But it is very important to identify and diagnose it at an early stage. So before identifying, it requires identifying and classifying the cancerous cell. Generally to detect the cancerous cell mammography process is more intuitive than any other methods. This is a method of computer aided diagnostic that includes digital image processing for detection of breast cancer. Article represents the method of detection of cancer affected cells and classifies normal patients to cancerous patients. Pre-processing operations are performed on mammographic images after normalisation of the mammographic images. To complete the task of prediction of cancer affected cells, a breast cancer prediction model architecture has been proposed with an accuracy of 94.87%. For the classification of cancerous patients and normal patients, VGG Net 19 architecture has been adopted with an accuracy of 97.27%. In the purposed framework model can be implemented practically as an application in the field of breast cancer diagnosis for a better result in a shorter period.
Keywords: artificial neural network; benign; breast cancer; fine needle aspirate; malignant; nuclei; recti linear unit.
Digital image watermarking based on hybrid FRT-HD-DWT domain and flamingo search optimisation
by P.J.R. Shalem Raju, K.V.D. Kasula, Pokkuluri Kiran Sree
Abstract: The image watermarking topologies afford a promising solution in digital media copyright protection. However, it is essential to take into account the robustness of watermarking methods. Therefore, in this paper, a digital image watermarking technique based on finite ridgelet transform (FRT), discrete wavelet transform (DWT), and Hessenberg decomposition (HD) is proposed. The FRT and HD methods are hybridised with DWT to enhance the embedding capacity. Likewise, the embedding strength factor is also optimised through a flamingo search algorithm (FSA). After embedding the extraction process is carried out with deep belief neural (DBN) network. The experiments are conducted on four kinds of host images like Lena, house, baboon, and Barbara in MATLAB platform. The results are compared with different models in terms of peak signal to noise ratio (PSNR), normalisation coefficient (NC), and structural similarity index measure (SSIM) under various geometric attacks.
Keywords: image watermarking; finite ridgelet transform; FRT; flamingo search algorithm; FSA; Hessenberg decomposition; strength factor; discrete wavelet transform; DWT.
Image-based deep learning automated grading of date fruit (Alhasa case study Saudi Arabia)
by Amnah Aldandan, Sajedah AlGhanim, Hawraa Alhashim, Mona A.S. Ali
Abstract: Dates are small and popular in the Middle East, and they grow in many countries. Many researchers focus on classifying dates by type. But the researchers didn't consider that many date industries sort dates by quality to determine the proper price and use. This paper classifies Tamer stage dates automatically based on quality. This study proposed two ways to differentiate date fruit quality. First, use CNN; VGG-16 to extract features from the dataset, then uses SVM classifier. Second method based on developed CNN. Tamar used three different images to train these models. Another contribution is the creation of our own dataset, which was acquired using a smartphone camera under uncontrolled lighting and camera parameter circumstances, such as autofocus and camera stabilisation. A comparison between two methods, the CNN model had 97% classification accuracy for Khalas, 95% for Ruzaiz and 90% for Shaishi.
Keywords: date fruit; classification; Rutab stages; deep learning; convolutional neural network; CNN; support vector machine; SVM; Saudi Arabia.
High-speed optical 3D measurement device for quality control of aircraft rivet
by Benjamin Bringier, Majdi Khoudeir
Abstract: Optical three-dimensional devices are widely used for quality control in the industry and allow controlling various defects or properties. In the context of aeronautics, quality control of the parts assembly is problematic due to the number of rivets to be checked and the necessary measurement accuracy. This paper presents a new device that makes it possible to measure the positioning of a rivet in less than two seconds with a measurement accuracy close to 10 um. A standard colour camera and a projector are used to achieve a relatively low-cost device. This device is lightweight and compact enough to be mounted on a robotic arm. Few parameters must be calibrated, and the proposed methodology is accurate even if device positioning errors occur or the appearance of the surface is changed. From a single image acquisition, about 2,000 measuring points on the aircraft skin and up to 600 measuring points on a rivet head of 1 cm2 are performed to evaluate its positioning. Our device is validated by a comparative approach and in real conditions of use.
Keywords: optical three-dimensional device; computer vision; image processing; quality control.
Deep learning method for human activity recognition using heaped LSTM and image pattern of activity
by P. Rajesh, R. Kavitha
Abstract: Deep learning, the most spelt word and habitually used technology of the researchers around the globe of technical arena. With the tremendous growth of technologies like data analytics, data mining, machine learning methods and IoT applications like health monitoring, safety and security, smart control, human movement acknowledgement has become more noteworthy achievement in the field of science. Utilising the most booming technology, we propose a unique approach for monitoring human activities who aspire to live and lead an independent life, mostly the elderly people. In this experiment we discovered a novel method in identifying the human activities and the forte of this approach is the privacy of the monitored person is ensured. This investigation is moved forward by utilising an improved convolution neural network (CNN) with enriched bi-directional LSTM (BLSTM). The activity recognition model is still optimised by using a heaped LSTM (HLSTM) and a fine trained data clustering algorithm. Our proposed approach, when trained and tested with a prominent dataset that contains sensor data, achieved overall accuracy of 99.43% for all the considered nine activities.
Keywords: activity recognition; bi-directional LSTM; BLSTM; clustering; convolution neural network; CNN; heaped LSTM.
Efficient masked face identification biometric systems based on ResNet and DarkNet convolutional neural networks
by Freha Mezzoudj, Chahreddine Medjahed
Abstract: The COVID-19 pandemic has caused death and serious illness in the entire world. During humanitys fight against this disease, the wearing of face masks has become and remains a necessity in our daily life. This critical fight encourages us to generate a rich masked face database (noted FEI-SM) with different variations of poses and different emotions. We also employed several robust convolutional neural network systems based on three ResNet and two DarkNet models (ResNet18, ResNet50, ResNet101, DarkNet19, and DarkNet53) to measure the accuracy of biometric identification of masked and un-occluded faces on the challenging masked face database FEI-SM. In general, the compared results are showing good accuracies with many used biometric systems. Through experimental runs, the obtained outputs show clearly that the scheme model based on ResNet18 is the most effective model to recognise individuals with masks in different scenarios in terms of rate recognition and testing time.
Keywords: biometric; masked face identification; ResNet; DarkNet; FEI-SM database; convolutional neural networks; CNN.
Convolutional neural networks for obstacle detection on the road and driving assistance
by Ramzi Mosbah, Larbi Guezouli
Abstract: Generally, a driver has moments of inattention, that can cause considerable damage. To deal with this issue, we have to detect obstacles on the road automatically. To do that, several challenges appear. Firstly, we have to locate the region of interest, which is the road part in the frame, then we have to detect the objects inside the region of interest. In this work we propose an improved driver assistance system using a camera on the front of the car. Acquired images from this camera feed our system. In the frames to be processed, we reduce the region of interest to the area of the road. Obstacles on the road are sought in this region of interest. At the same time, we take care of the driver by detecting whether he is drowsy. Experimental results were evaluated using KITTI Vision Benchmark Suite and short videos recorded on streets in Batna.
Keywords: obstacle detection; image edge detection; driving assistance; object recognition; convolutional neural networks.
Kidney image classification using transfer learning with convolutional neural network
by Priyanka, Dharmender Kumar
Abstract: For abdominal studies, one of the most widely used diagnostic methods is ultrasound imaging. Several chronic kidney diseases (CKDs) such as kidney stone, cystic kidney, and hydronephrosis are present in the human kidney. These CKDs, later on, lead to the development of a number of severe diseases particularly heart diseases, pulmonary attacks, cardiomyopathy, etc. Therefore, early detection of CKDs is highly desirable in clinical practices as it can save hundreds of lives. Nowadays, the main focus of researchers is to develop automatic disease detection methods, avoiding the need for human interaction. The study of deep learning models is playing a critical role in various applications of healthcare not only due to their fast and accurate results, but also minimal manual interference is required in these methods. In this paper, two approaches are proposed for the detection of CKDs in ultrasound kidney images. The first one is a conventional approach that uses GA optimised neural network (GAONN) as classifier, whereas in other approach, convolution neural network model such as AlexNet is used for automatic detection of diseases. AlexNet is trained using the transfer learning process. Experimental results show that CNN performs better than GA optimised neural network in classifying kidney images.
Keywords: convolution neural network; CNN; GA optimised neural network; transfer learning; accuracy; principal component analysis; grey level co-occurance matrix.
Irregularities recognition system for automotive pieces
by Ignacio Algredo-Badillo, Germàn Portillo-García, Kelsey A. Ramírez-Gutiérrez, Luis A. Morales-Rosales
Abstract: The automotive industry is a growing sector in Mexico that requires many production processes. One of the most important is auto parts manufacturing with high-quality standards to avoid economic losses. Hence, the failure detection of pieces must be carried out in the early process, discarding those that do not reach the desired quality. This paper deals with an object recognition system to automatically find failures in circular automotive pieces. This is an open problem in the automobile assembly process to guarantee product quality. We detect imperfections (above 3.5 mm) on small pieces, such as scratches and dents on edges, by using an image processing stage, where no training is included, with a low-cost camera. The average processing time to detect failures is 2.7 seconds, which allows us to examine more pieces in a short time compared with other works and with manual inspections carried out by human experts. The proposed system reaches an accuracy of 98% and is implemented in the LabVIEW tool.
Keywords: automotive industry; vehicle pieces; defects detection; Hough transform; irregularities recognition.
Multi-agents system for breast tumour detection in mammography by deep learning pre-processing and watershed segmentation
by Hayet Saadi, Hayet Farida Merouani, Ahlem Melouah, Zahia Guessoum, Saida Lemnadjlia, Nacereddine Boukabach
Abstract: Mammography is the most used process for females to diagnose and screen breast cancer. In this paper, we presented an enhanced automatic watershed segmentation for breast tumour detection and segmentation reinforced with a group of interactive agents. First, we started by a pre-processing based on deep learning (DL), where a convolution neural network (CNN) is applied, to classify the breast density by AlexNet architecture. Second, classic watershed segmentation was applied on these images. Afterward, a multi-agents system (MASs) was introduced. The information within pixels, regions and breast density were explored, to create a region of interest (ROI), and to emerge the MAS segmentation. Experimental results were promising in term of accuracy (ACC), with an overall of 97.18% over three datasets, Mammographic Image Analysis Society (MIAS), INBreast, and a local dataset called Database of Digital Mammograms of Annaba (DDMA). In some cases, our approach was able to detect breast calcification accurately.
Keywords: mammography; tumour detection; watershed segmentation; multi-agent systems; multi-agents system; deep learning; convolution neural network; AlexNet architecture; pre-processing; breast density; computational vision; computer-aided diagnosis systems.
Identifying optimised speaker identification model using hybrid GRU-CNN feature extraction technique
by Md. Iftekharul Alam Efat, Md. Shazzad Hossain, Shuvra Aditya, Jahanggir Hossain Setu, K.M. Imtiaz-Ud-Din
Abstract: Extracting vigorous and discriminative features and selecting an appropriate classifier model to identify speakers from voice clips are challenging tasks. Thus, we considered signal processing techniques and deep neural networks for feature extraction along with state-of-art machine-learning models as classifiers. Also, we introduced a hybrid gated recurrent unit (GRU) and convolutional neural network (CNN) as a novel feature extractor for optimising the subspace loss to extract the best feature vector. Additionally, space-time is contemplated as a computational parameter for finding the optimal speaker identification pipeline. Later, we have inspected the pipeline in a large-scale VoxCeleb dataset comprising 6,000 real world speakers with multiple voices achieving GRU-CNN + R-CNN for the highest accuracy and F1-score as well as GRU-CNN + CNN for maximum precision and LPC + KNN for the highest recall. Also, LPCC + R-CNN and MFCC + R-CNN are accomplished as optimal in terms of memory usage and time respectively.
Keywords: computational complexity; deep learning; feature extraction; speaker identification; VoxCeleb dataset.