International Journal of Computational Vision and Robotics (44 papers in press)
Real time sign language recognition using depth sensor
by Jayesh Gangrade, Jyoti Bharti
Abstract: Communication via gestures is a visual dialect utilized by deaf and Hard-of-Hearing (HoH) people group. This paper proposed a system for sign language recognition utilizing human skeleton data provided from Microsofts Kinect sensor to recognizing sign gestures. The Kinect sensor generates the skeleton of a human body and distinguishes 20 joints in it. The proposed method utilizes 11 out of 20 joints and extracts 35 novel features per frame, based on distances, angles and velocity involving upper body joints. Multi-class Support Vector Machine classified the 35 Indian sign gestures in real time with accuracy of 87.6%. The proposed method is robust in cluttered environment and viewpoint variation.
Keywords: Kinect sensor; Indian sign gesture; Multi class support vector machine; Human computer interaction; Pattern recognition.
Crypto-compression Scheme based on the DWT for Medical Image Security
by Med Karim Abdmouleh
Abstract: Ensuring the confidentiality of exchanged data is always a great concern for any communication. Also, the purpose of compression is to reduce the amount of data while preserving important information. This reduction leads to the archiving of more information on the same storage medium and minimizes the transfer times via telecommunication networks. Indeed, the combination of encryption and compression guarantees both confidentiality and authentication of information. In addition, it reduces processing time and transmission on public channels and increases storage capacity. In this paper, we propose a new approach of a partial or selective encryption for medical Images based on the Discrete Wavelet Transform (DWT) coefficients and compatible with the norm JPEG2000. The obtain results prove that, the proposed scheme provides a significant reduction of the processing time during the encryption and decryption, without tampering the high compression rate of the compression algorithm.
Keywords: Crypto-compression; Encryption; Compression; Discrete Wavelet Transform; RSA; JPEG2000; Telemedicine.
Non-Invasive Technique of Diabetes Detection using Iris Images.
by Kesari Verma, Bikesh Kumar Singh, Neelam Agrawal
Abstract: Alternative medicine techniques are important in improving the quality of life, disease prevention and better to the conventional invasive method of diseases detection. This paper addresses a non-invasive approach of diabetic detection using iris images. The proposed technique evaluate the use of iridology to diagnose diabetes using modern digital image processing techniques that analyses structural properties of the iris and classifies the patterns accordingly. The system analyses the broken tissues of the iris by extracting significant textural features using Gabor filter bank and Gray Level Co-occurrence Matrix (GLCM) from the subsection of the iris. The extracted textural features help to categorize the diabetic and non-diabetic irises using benchmarks Artificial Neural Network (ANN) and Support Vector Machine (SVM) classifiers. The promising results of extensive experiments demonstrate the effectiveness of the proposed method.
Keywords: diabetes detection; image processing; iris images; support vector machine; artificial neural network; SVM; ANN; gabor features; gray level co-occurrence matrix; GLCM; Non-Invasive Technique.
Autonomous Void Detection and Characterisation in Point Clouds and Triangular Meshes
by Benjamin Bird, Barry Lennox, Simon Watson, Thomas Wright
Abstract: In this paper we propose and demonstrate a novel void characterisation algorithm which is able to distinguish between internal and external voids that are present in point clouds of both manifold and non-manifold objects and 3D scenes. We demonstrate the capabilities of our algorithm using several point clouds representing both scenes and objects. Our algorithm is shown in both a descriptive overview format as well as pseudocode. We also compare a variety of different void detection algorithms and then present a novel refinement to the best performing of these algorithms. Our refinement allows for voids in point clouds to be detected more efficiently, with fewer false positives and with over an order of magnitude improvement in terms of run time. We show our run time performance and compare it to results obtained using alternative algorithms, when tested using popular single board computers. This comparison is important as our work is intended for online robotics applications, where hardware is typically of low computational power. The target application for this work is 3D scene reconstruction to aid in the decommissioning of nuclear facilities.
Keywords: Point Cloud; Void Detection; Meshing; Reconstruction; Computer
GCSAC: Geometrical Constraint SAmple Consensus for Primitive Shapes Estimation in 3-D Point Cloud
by Le Van Hung, Hai Vu, Thi-Thuy Nguyen, Thi-Lan Le, Thanh-Hai Tran
Abstract: Estimating parameters of a primitive shape from a 3-D point cloud data is a challenging problem due to data containing noises and computational time demand. In this paper, we present a new robust estimator (named GCSAC, Geometrical Constraint SAmple Consensus) aimed at solving such issues. The proposed algorithm takes into account geometrical constraints to construct qualified samples for the estimation. Instead of randomly drawing minimal subset of sample, explicit geometrical properties of the interested primitive shapes (e.g., cylinder, sphere and cone) are used to drive sampling procedures. At each iteration of GCSAC, the minimal subset sample is selected based on two criteria (1) It must ensure a consistency with the estimated model via a roughly inlier ratio evaluation; (2) The samples satisfy geometrical constraints of the interested objects. Based on the obtained good samples, model estimation and verification procedures of the robust estimator are deployed in GCSAC. Extensive experiments have been conducted on synthesized and real datasets for evaluation. Comparing with the common robust estimators of RANSAC family (RANSAC, PROSAC, MLESAC, MSAC, LO-RANSAC and NAPSAC), GCSAC outperforms in term of both the precision of the estimated model and computational time. The implementations of the proposed method and the datasets are made publicly available.
Keywords: Robust Estimator; Primitive Shape Estimation; RANSAC and RANSAC Variations; Quality of Samples; Point Cloud data.
Exploring the Effects of Non-Local Blocks on Video Captioning Networks
by Jaeyoung Lee, Junmo Kim
Abstract: In addition to visual features, video also contains temporal information that contributes to semantic meaning regarding the relationships between objects and scenes. There have been many attempts to describe spatial and temporal relationships in video, but simple encoder-decoder models are not sufficient for capturing detailed relationships in video clips. A video clip often consists of several shots that seem to be unrelated, and simple recurrent models suffer from these changes in shots. In other fields, including visual question answering and action recognition, researchers began to have interests in describing visual relations between the objects. In this paper, we introduce a video captioning method to capture temporal relationships with a non-local block and boundary-aware system. We evaluate our approach on a Microsoft Video Description Corpus (MSVD, YouTube2Text) dataset and a Microsoft Research-Video to Text (MSR-VTT) dataset. The experimental results show that a non-local block applied along a temporal axis can improve video captioning performance on video captioning datasets.
Keywords: Video captioning; Non-local mean; Self-attention; Video description.
A Novel Approach for Mitigating Atmospheric Turbulence using Weighted Average Sobolev Gradient and Laplacian
by Prifiyia Nunes, Dippal Israni, Karthick D, Arpita Shah
Abstract: Heat scintillation mainly leads to atmospheric turbulence which causes the image distortion due to the propagation of light through the volatile environment. The change in the refractive index due to variation in wind velocity is also the reason for causing turbulence in the atmosphere. Traditional image registration approach lags as it is computationally expensive and need post-processing algorithm for sharpening the image. A non-registration-based Sobolev gradient and Laplacian (SGL) algorithm removes turbulence but results in ghost artifacts in moving objects. This paper proposes a novel approach based on weighted average SGL. The proposed method mitigates atmospheric turbulence as well as restores the moving object in the scene. Performance metrics like SSIM and MSE prove that the proposed algorithm outperforms the state of the art algorithms in terms of restoring the geometric distortion as well as the object of interest.
Keywords: atmospheric turbulence; heat scintillation; restoration; Sobolev; phase shift; weighted average.
3D Object Classification Based on Deep Belief Networks and Point Clouds
by Fatima Zahra OUADIAY, Nabila ZRIRA, Mohamed HANNAT, El Houssine BOUYAKHF, Mohammed Majid HIMMI
Abstract: Since the discovery of 3D sensors such as Kinect camera, 3D object models, and point clouds become frequently used in many areas. The most important one is the 3D object recognition and classification in robotic applications. This type of sensors, like the human vision, allows generating the object model from a field of view or even a complete 3D object model by combining several individual Kinect frames. In this work, we propose a new feature learning-based object classification approach using Point Cloud Library (PCL) detectors and descriptors and Deep Belief Networks (DBNs). Before developing the classification approach, we evaluate 3D descriptors by proposing a new pipeline that uses the L2-distance and the recognition threshold. 3D descriptors are computed on different datasets, in order to achieve the best descriptors. Subsequently, these descriptors are used to learn robust features in the classification approach using DBNs. We evaluate the performance of these contributions on two datasets; Washington RGB-D and our real 3D object datasets. The results show that the proposed approach outperforms advanced methods by approximately 5% in terms of accuracy.
Keywords: Kinect; 3D Object classification; PCL; recognition threshold; DBNs; Washington RGB-D.
A novel fast fractal image compression based on reinforcement learning
by Bejoy Varghese, Krishnakumar S
Abstract: The concept of digital image compression is of considerable interest in the area of transmission and storage of images. The recent research in this area explores the combination of different coding techniques to achieve a better compression ratio without compromising the image quality. Fractal-based coding techniques got the attention of the research community from the very earlier days of data compression. However, those methods are computationally intensive at that time because of the exhaustive search involved to select a transformation sequence. In this paper, we propose a system that replaces the current domain-range comparison in the fractal compression with a reinforcement learning technique that reduces the compression time and increases the PSNR. The system will learn from the output of the exhaustive algorithm in the initial state and discard the combinatorial search after trained on a data set. The recommended method shows a good improvement in the compression ratio, PSNR and compression time.
Keywords: Machine learning; Image compression; Reinforcement learning; Fractal coding.
Video Summarization based on Motion Estimation using Speeded up Robust Features
by Dipti Jadhav, Udhav Bhosle
Abstract: Video Summarization (VS) is a technique to extract keyframes from a video based on video contents. It provides user with a brief representation of video contents to semantically understand the video. This paper aims to present video summarization based on motion between consecutive video frames. The motion between frames is represented by affine and homograph transformation. The video frames are represented by a set of Speeded Up Robust Features (SURF). The keyframes are extracted in a sequential manner by successively comparison with the previously declared keyframe based on motion. The validity of the proposed algorithms is demonstrated on videos from Internet, YouTube dataset and Open Video Project. The proposed work is evaluated by comparing it with different classical and state-of-the-art video summarization methods reported in the literature. The experimental results and performance analysis validates the effectiveness and efficiency of the proposed algorithms.
Keywords: Video Summarization; motion estimation. Key frames; SURF; affine transformation; homography.
Electroencephalography based classification of human emotion: A hybrid strategy in machine learning paradigm
by Bikesh Kumar Singh, Ankur Khare
Abstract: The objective if this article is to develop a new improved two stage method for classifying emotional states of human by fusing back-propagation artificial neural network (BPANN) and k-nearest neighbors (K-NN). A publicly available electroencephalograph (EEG) signal database for emotion analysis using physiological signals is used in experiments. The EEG signals are initially pre-processed followed by feature extraction in time domain and frequency domain. The extracted features were then supplied to proposed model for emotion recognition. The proposed machine learning framework attains higher classification accuracy of 78.33 % as compared to conventional BPANN and K-NN classifiers, which achieves classification accuracy of 56.90 % and 59.52 % respectively. Future work is required to evaluate the proposed model in practical scenario wherein a proficient psychologist or medical professional can analyze the emotion recognized by first stage and the unsure test cases can be supplied to secondary classifier (k-NN) for further assessment.
Keywords: Brain computer interface; emotion; electroencephalogram; hybrid classifier.
Blocking of Operation of Unauthorized Software using MQTT
by Kitae Hwang
Abstract: This paper presents design and implementation of the Meerkat system; a system that detects operation of software that is unauthorized. The MQTT protocol has been used for data communication in Meerkat system. The Meerkat system is largely comprised of three components: Meerkat client, the web application that operates as admin, and the server software. The Meerkat client alerts the MQTT broker as soon as it detects operation of unauthorized software on the users PC. The admin receives the information from the MQTT broker immediately via the MQTT broker. To evaluate the performance of the system, the transmission time between messages delivered from the user/admin PC was measured. The measurements illustrated that it took, on average, 8~50 milliseconds for a message to be delivered. These results indicate that the messages are delivered quickly enough for the Meerkat system to be put into actual use.
Keywords: MQTT; publish-subscribe; unauthorized software; Mosquitto.
Development of Translation Telephone System by Using MQTT Protocol
by Jae Moon Lee
Abstract: This paper is a study on the development of a translation telephone system that enables two individuals speaking different languages to communicate through phone call. The system will be developed using the MQTT protocol - a push service technology - and voice, translation related web services that have been experiencing rapid improvement along with the development in the artificial intelligence technology. The core technology applied to the system are voice recognition, text translation, and speech synthesis. In order to guarantee that the system runs real time, the system is designed to utilize as many threads as possible so that the functions can be operated simultaneously. In order to minimize communication traffic, the system is designed to convert a conversation into text and send out a translated text to the counterpart instead of sending voice data. Also, to ensure the accuracy of the translation, the system is designed to translate all given information in a sentence basis separately. The proposed system has been developed to operate in Android smartphones. Because sentences do not tend to be too long during a conversation normally, we can know experimentally that the developed translation telephone system appears to run real time.
Keywords: Telephone System; Translation; Web Service; Push Service; Speech Recognizer; Speech Synthesizer; MQTT.
An Iris Biometric-based Dual Encryption Technique for Medical Image in e-Healthcare Application
by Aparna P., P. V.V. Kishore
Abstract: Medical image watermarking has been broadly distinguished as a relevant technique for rising data content verification, security, image fidelity, and authenticity in the current e-health environment where medical images are stored, retrieved and transmitted over networks. Maintaining a secure environment for Tele-radiology from different issues such as malpractice liability and Image Retention etc is a challenging task. To discover the security issues, we suggest biometric key based medical image watermarking technique in E-health care application in this paper. In this paper, two types of inputs are utilized such as patient MRI image and electronic health record (HER). Initially, we segment the ROI region and encrypt the information using SHA-256 algorithm. Then, we encrypt the EHR Information using ECC algorithm. In this ECC algorithm, for key generation we utilize iris biometric which is increase the security level of watermarking system. Then, we concatenate the image and EHR information. Further increase the system security, we use arithmetic encoding algorithm to compress the bit stream. Then finally, we embed the bit-stream into cover image. The same process is repeated for the extraction process. The experimental result is carried out on the different medical images with EHR and the effectiveness of the proposed algorithm is analyzed with the help of Peak signal to noise ratio (PSNR) and normalized correlation (NC). The proposed methodology is used for many applications concerned with privacy protection, safety and management.
Keywords: SHA-256; elliptical curve cryptography; biometric key; watermarking; Authentication; iris image; arithmetic encoding.
2D-Feature descriptor without orientation Compensation
by Manel Benaissa, Abdelhak Bennia
Abstract: Several feature descriptors have been proposed in the literature with a variety of definitions and a common goal, describe and get the best possible match between potentially interesting points in two images. In this paper, we proposed a new orientation invariant feature descriptor without an additional step dedicated to this task. We exploited the information provided by two representations of the image (intensity and gradient) for a better understanding and representation of the feature point and its surroundings. The information provided is summarized in two cumulative histograms and used in the description and matching process of the feature points. The experimental results show its robustness in the face of multiple image changes.
Keywords: Feature understanding; Feature description; Feature matching; object detection.
New Color Fusion Deep Learning Model for Large-Scale Action Recognition
by Abhishek Verma
Abstract: In this work we propose a fusion methodology that takes advantage of
multiple deep convolutional neural network (CNN) models and two color spaces
RGB and oRGB to improve action recognition performance on still images. We
trained our deep CNNs on both the RGB and oRGB color spaces, extracted and
fused all the features, and forwarded them to an SVM for classification. We
evaluated our proposed fusion models on the Stanford 40 Action dataset and the
People Playing Musical Instruments (PPMI) dataset using two metrics: overall
accuracy and mean average precision (mAP). Our results prove to outperform the
current state-of-the-arts with 84.24% accuracy and 83.25%mAP on Stanford 40
and 65.94% accuracy and 65.85% mAP on PPMI. Furthermore, we also evaluated
the individual class performance on both datasets. The mAP for top 20 individual
classes on Stanford 40 lies between 97% and 87%, on PPMI the individual mAP
class performance lies between 87% and 34%.
Keywords: deep convolutional neural networks; deep learning; fusion model; action recognition; VGGNet; GoogLeNet; ResNet;.
Discrete Texture Elements Synthesis on Surfaces using Elements Distribution
by Yan Gui, Yang Liu
Abstract: In this paper, we present a novel method with a special focus on reproduction of distribution of texture elements over arbitrary 3D surfaces. To this end, this paper proposes neighbourhood comparisons to find the best matching neighbourhood for local growth over 3D surfaces, when taking the 2D connectivity constructed from the input sample texture used as a reference. The synthesized distribution provides the positions information of texture elements on the surface. Then, we perform the paste operations by using local parameterization to obtain the final textured results. Experimental results show that the proposed method is successful in generating elements distribution over 3D surfaces. Moreover, our method especially works well for the textures with discrete texture elements, which maintains the integrity of the synthesized texture elements over 3D surfaces.
Keywords: surface texture synthesis; discrete texture elements; elements distribution; neighbourhood comparison; local parameterization.
Computational Linguistic Retrieval Framework using Negative Bootstrapping for Retrieving Transliteration Variants
by Shashi Shekhar, Dilip Sharma, M.M. Sufyan Beg
Abstract: In Natural Language Processing one of the imperative and relatively less mature area is a transliteration. During transliteration, issues like language identification, script specification, missing sounds arise in mixed script queries (native script and non-native script). To overwhelm these issues we propose a new technique called negative bootstrapping with frequent matrix Apriori for transliteration. Roman script is widely used in web search query for searching contents. The Major challenge that the system face to process transliterated word is because of its existence in more than one form due to the possibility of writing that word in different spelling variation. The proposed methodology subsists of case conversion, bi-level feature extraction, n-gram text categorization and negative bootstrapping with frequent matrix Apriori in training stage and in testing stage. The experimental evaluation has been done to check transliteration accuracy along with language identification to which the word belongs against established methods on the benchmark dataset of Microsoft Research for Mixed Script Information Retrieval, FIRE(Forum for Information Retrieval Evaluation). The paper offers a high-principled answer to handle multiple scripts used in a document leading to the problems of term matching and committing variations in spelling while searching the contents. The problem is modeled collectively with the deep-learning design. We tend to gift an in-depth empirical analysis of the proposed methodology against standard approaches for transliteration. The proposed method achieves significantly better results in terms of MRR, MAP and accuracy when applied on n-Gram approach on the benchmark dataset.
Keywords: Feature extraction; text categorization; negative bootstrapping; Apriori; transliteration; mulltilexical matching; Substitution; Variations; word normalization; NLP; Machine Learning.
Two Stage Optimized Video Summary.
by Dipti Jadhav, Udhav Bhosle, Jyoti Deshmukh
Abstract: Video Summarization (VS) is a technique to extract keyframes from a video based on video contents. This paper presents a two stage video summarization. The first stage video summary is generated based on optimization of SURF keypoints using modified PreARM algorithm. The video summary is generated based on the number of matched optimized keypoints between the consecutive video frames. The second stage video summary aims at reducing the redundancy of the generated video summary using Multi-Objective Genetic Algorithm. The experimental results on videos from Internet, YouTube dataset and Open Video Project demonstrate validity of the proposed work. The performance analysis validates the effectiveness and efficiency of the proposed algorithms.
Keywords: Video Summarization; SURF Keypoint Optimization; Multi-Objective Genetic Algorithm.
Personal Authentication and Risk Evaluation by Sensible Keyboard Sound
by Hyo-Joong Suh, Hoyoung Hwang
Abstract: Personal authentication is an essential process in various network applications such as online shopping, mobile banking, e-Government services, etc. The authentication processes normally use personal or device information which include user password, IP address, device number, MAC address, and bio information such as finger prints or iris scan. In addition to the direct personal or device information, environmental information can be used for authentication. For example, the location and time of the device use may be considered as important factors to evaluate the risk of false use. The authentication technique is getting more important as the volume of online business and the chances of non-face-to-face contacts enormously increase. This paper proposes a personal authentication technique using sensible sound of keyboard input device. We can extract the unique features of the keyboard stroke sound, and use the features for personal authentication and also for risk evaluation to protect from fraud.
Keywords: personal authentication; keyboard; sound; hijacking; risk evaluation.
An IoT Based Smart Parking Management System
by Inhwan Jung
Abstract: In this paper, we implemented smart parking information management system based on IoT using ultrasonic parking sensor and Bluetooth beacon. Ultrasonic IoT sensors used for parking sensing are controlled by the Arduino board to collect parking sensing data, and the collected parking information is transmitted to the server in real time using the MQTT protocol. The server stores the parking information in the database and provides the vehicle driver with real-time parking status using the MQTT protocol. The driver starts the smartphone app in the parking lot and enters the parking lot. The smartphone automatically recognizes which parking space using the Bluetooth beacon signal at the entrance of the parking lot and can confirm the parking information of the parking space to be parked at a glance by communicating with the server. The parking management system implemented in this study not only helps the driver to park the car but also uses the real - time parking information stored in the database to obtain marketing information such as hourly, daily or monthly number of visiting customers and average shopping time.
Keywords: IoT; Smart Parking; Bluetooth Beacon; Parking App; MQTT.
Multi-document Summarization Using Feature Distribution Analysis
by Jae-Young Chang
Abstract: Recently, opinion documents have been growing rapidly in an environment where anyone can express an opinion on the Internet or SNS. This situation requires an automatic summarization technique in order to understand the contents of large-scale opinion documents. However, it is not easy to summarize the opinion documents with previous text summarization technologies since the opinion documents include subject expressions, as well as features of targets objects. In this paper, a method to identify and extract the representative documents with a large amount of opinion documents is proposed. In addition, experiments show that the proposed method successfully extracts representative opinion documents.
Keywords: Multi-document Summarization; Text Mining; Opinion Mining; Feature; Social Network Servic.
Situation-Cognitive Traffic Light Control Based On Object Detection Using YOLO Algorithm
by Sun-Dong Kim
Abstract: Current traffic lights provide the green signal with fixed-time interval without considering the traffic situation. As a result, cars in a long line have to wait long time, which causes traffic jams and makes the drivers be irritated. In order to solve the problem, it is necessary to control the green signal interval according to the analyzed traffic volume using the image processing and the machine learning techniques. This paper presents a situation-cognitive traffic light control algorithm that measures the traffic volume using object detection algorithm called YOLO (You Only Look Once) and controls the traffic signal intervals according to the traffic volume. The algorithm expects the smooth traffic flow and the reduction of the drivers stress.
Keywords: YOLO; You Only Look Once; Object Detection; Situation-Cognitive; Traffic Light Control.
Occlusion Handling Strategies for Multiple Moving Object Classification
by Pawan Kumar Mishra, G.P. Saroha
Abstract: A framework has been designed for detection and classification of multiple moving vehicles. Background subtraction is used for detection of multiple moving objects like vehicles using Gaussian mixture model (MOG). Classification for multiple moving vehicles using K-nearest neighbour is done based on different features in this research. The method used in this research also improves the value of accuracy and occlusion rate for multiple moving vehicles in video frames. In this paper, we also learn a single detector for different types of multiple moving vehicles such as buses, trucks, and cars. This detector uses a special kind of function that is known as occlusion metric function. The main goal of this research is to build a function that is used to calculate the performance of detector between number of false positives and hit rate in heavy traffic (high activity) and small traffic (low activity) region.
Keywords: detection; classification; occlusion; accuracy; hit rate; false positive.
Energy Based Features for Kannada Handwritten Digit Recognition
by GURURAJ MUKARAMBI, Basanna Dhandra
Abstract: In this paper, Kannada handwritten digit recognition system is proposed based on discrete wavelet transform filters. A sample data set of Kannada handwritten digits are collected from Schools, Colleges, Business persons and Professionals etc. Due to non-availability of standard data sets. The collected samples of hand written Kannada digits are scanned at 300 DPI. The images are pre-processed using Morphological opening operation for removing the noise and bilinear operation is used for normalized into 32 X 32 pixels as it is the optimum size for the experiment. The normalized sample images were divided into 16 blocks, and then wavelet filters were applied for each of the 16 blocks and computed the Standard deviation for each of them. In this process, a total of 64 standard deviation of the wavelet coefficients are generated of which 48 coefficients are selected as potential features by identifying approximation co-efficient as the non-potential features for discriminating the Kannada handwritten digits, since horizontal, vertical and diagonal coefficients captures the energy in these three directions for Haar, Daubechies, Coiflets and Symlets Wavelet families. The nearest neighbor classifier is applied for recognition. The average recognition accuracy of 94.80% is achieved. The proposed algorithm is free from skew and thinning and is the novelty of the paper.
Keywords: OCR; DWT; Nearest Neighbor; SVM.
An Optimal Mode Selection Algorithm for Scalable Video Coding
by L. Balaji, K.K. Thyagharajan, C. Raja, A. Dhanalakshmi
Abstract: Abstract: Scalable Video Coding (SVC) is extended from its predecessor Advanced Video Coding (AVC) because of its flexible transmission to all type of gadgets. However, SVC is more flexible and scalable than AVC, but it is more complex in determining the computations than AVC. The traditional full search method in the standard H.264 SVC consumes more encoding time for computation. This complexity in computation need to be reduced and many fast mode decision (FMD) algorithms were developed, but many fail to balance in all the three measures such as PSNR (peak signal to noise ratio), encoding time and bit rate. In this paper, the proposed optimal mode selection algorithm based on the orientation of pixels achieves better time saving, good PSNR and coding efficiency. The proposed algorithm is compared with the standard H.264 JSVM reference software and found to be 57.44% time saving, 0.43 dB increments in PSNR and 0.23 % compression in bit rate.
Keywords: Scalable Video Coding; Computation; mode selection; PSNR; Time; Bit rate.
Large-Scale Scene Image Categorization with Deep Learning Based Model
by Abhishek Verma
Abstract: Increasing depth of convolutional neural networks (CNNs) is a highly promising method of increasing the accuracy of the (CNNs). Increased CNN depth will also result in increased layer count (parameters), leading to a slow backpropagation convergence prone to overfitting. We trained our model (Residual-CNDS) to classify very large-scale scene datasets MIT Places 205, and MIT Places 365-Standard. The outcome result from the two datasets proved our proposed model (Residual-CNDS) effectively handled the slow convergence, overfitting, and degradation. CNNs that include deep supervision (CNDS) add supplementary branches to the deep convolutional neural network in specified layers by calculating vanishing, effectively addressing delayed convergence and overfitting. Nevertheless, (CNDS) doesnt resolve degradation; hence, we add residual learning to the (CNDS) in certain layers after studying the best place in which to add it. With this approach we overcome degradation in the very deep network. We have built two models (Residual-CNDS 8), and (Residual-CNDS 10). Moreover, we tested our models on two large-scale datasets, and we compared our results with other recently introduced cutting-edge networks in the domain of top-1 and top-5 classification accuracy. As a result, both of models have shown good improvement, which supports the assertion that the addition of residual connections enhances network CNDS accuracy without adding any computation complexity.
Keywords: Residual-CNDS; scene classification; Residual Learning Convolutional Neural Networks; Convolutional Networks with Deep Supervision.
Extracting and Searching News Articles in Web Portal News Pages
by Namyun Kim
Abstract: Recently, a large amount of news articles is being created online, and news articles are important resources for understanding social phenomena and trends. Accordingly, a web portal service provides a "Portal News Page" that classifies news articles published from various news sources into sections and provides each news article with a certain structure. Therefore, by analyzing portal news pages, it is possible to automatically extract information about news articles. In this paper, we introduce a prototype that extracts and searches key information of news articles for analysis. Specifically, we describe (1) a crawler that collects, analyzes and parses news articles, and (2) an Elasticsearch server that indexes and searches news information, and (3) a front-end application that provides a search user interface. These systems are expected to provide the foundation for news analytics and forecasting services.
Keywords: Crawler; Search Engine; Elasticsearch; News Service and Analysis.
Cursive Multilingual Characters Recognition Based on Hard Geometric Features
by Amjad Rehman, Majid Harouni, Tanzila Saba
Abstract: The cursive nature of multilingual characters segmentation and recognition of Arabic, Persian, Urdu languages have attracted researchers from academia and industry. However, despite several decades of research, still multilingual characters classification accuracy is not up to the mark. This paper presents an automated approach for multilingual characters segmentation and recognition. The proposed methodology explores characters boundaries based on their geometric features, prior to their recognition. However, due to uncertainty and without dictionary support few characters are over-divided. To expand the productivity of the proposed methodology a BPN is worked out with countless division focuses for cursive multilingual characters. Trained BPN separates off base portioned indicates effectively with rapid upgrade character recognition precision. For reasonable examination, only benchmark dataset is utilized.
Keywords: OCR; Multilingual character recognition; features mining; geometrical features; BPN.
Stego-key based image steganography scheme using edge detector and modulus function
by SHIV PRASAD, ARUP KUMAR PAL
Abstract: In this paper, our main concern is to devise an image steganography scheme for enhancing the security along with the payload capacity of the cover image. So, in this work, a secure image steganography scheme is proposed where the embedding process of secret message bits is realized by a secret key. To improve the embedding capacity of the cover image, the hiding process of secret message bits is furnished with the help of cover image characteristic, where more number of secret message bits are concealed into the edge-region instead of the smooth region of the cover image. For improving the security of the content, generally, steganography and cryptography are clubbed together. However, in this work, instead of considering two different security mechanisms, we have embedded the secret message bits into the cover-image with reference of keys i.e. known as a stego-key. This approach not only enhances security but also reduces the computation overhead. Variable length of secret message bits are concealed into the edge-pixels and non-edge pixels using the modulus function based embedding process. The secret message bits are not embedded sequentially into each pixel of the cover image where the number of edge-pixels and non-edge pixels will be varied due to the selection of various threshold values during the edge detection process. This threshold value may be considered as a key-parameter and only the authorized user will able to locate the edge and non-edge pixels during the message extraction process. The scheme is implemented on some standard grayscale images and satisfactory results are achieved in terms of visual quality along with the higher payload.
Keywords: Data hiding; Edge detection; Image steganography; Information security; Modulus function; Stego-key.
An Ensemble of Neural Networks for Non-Linear Segmentation of Overlapped Cursive Script
by Amjad Rehman
Abstract: Intro: Precise character segmentation is the only solution towards higher Optical Character Recognition (OCR) accuracy. In cursive script, overlapped characters are serious issue in the process of character segmentations as characters are deprived from their discriminative parts using conventional linear segmentation strategy. Background: Hence, non-linear segmentation is an utmost need to avoid loss of characters parts and to enhance character/script recognition accuracy. This paper presents an improved approach for non-linear segmentation of the overlapped characters in handwritten roman script. Contribution: The proposed technique is composed of a sequence of heuristic rules based on geometrical features of characters to locate possible non-linear character boundaries in a cursive script word. However, to enhance efficiency, heuristic approach is integrated with trained ensemble neural network validation strategy for verification of character boundaries. Accordingly, correct boundaries are retained and incorrect are removed based on ensemble neural networks vote. Conclusion: Finally, based on verified valid segmentation points, characters are segmented non-linearly. For fair comparison CEDAR benchmark database is experimented. The experimental results are much better than conventional linear character segmentation techniques reported in the state of art. Ensemble neural network play vital role to enhance character segmentation accuracy as compared to individual neural networks.
Keywords: Non-linear character segmentation; ensemble neural networks; Analytical approach; CEDAR database.
Performance Evaluation of Various Texture Analysis Techniques for Machine Vision based Characterization of Machined Surfaces
by Ketaki Joshi, Bhushan Patil
Abstract: Machine vision-based inspection of surface quality leverages the principle of surface-texture characterization, capitalizing on image data characteristics. Frequently, surface-texture analysis adopts statistical and filter-based techniques, for this purpose. For surface texture characterization, traditionally researchers prefer parameterized histograms, gray level co-occurrence matrices, discrete Fourier transforms as well as discrete wavelet transforms. Despite popular usage, extant literature features very little in terms of comparative analyses amongst these techniques.
Accordingly, this paper evaluates comparative performance of these techniques, for characterization of machined surfaces and also recommends a novel hybrid technique that leverages higher discriminating capability. This hybrid discriminant-analysis methodology is derived from characterization of 532 images of multi-textured machined surfaces. The results prove that the proposed technique, provides superior performance with higher accuracy, while requiring reduced optimal set of parameters, for inspection of surface quality.
Keywords: machine vision; texture analysis; image processing; discriminant analysis; multivariate techniques; surface texture; surface quality; histogram; gray level co-occurrence matrix; discrete Fourier transform; discrete wavelet transform.
Deep Reinforcement learning Collision Avoidance using Policy Gradient Optimization and Q-Learning
by Shady Maged, Bishoy Mikhail
Abstract: Usage of Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) children of policy gradient optimization method and Deep Q-Learning Network (DQN) in Lidar based differential robots are proposed using Turtlebot and OpenAIs baselines optimization methods. The simulation results proved that the three algorithms are ideal for obstacle avoidance and robot navigation with the utter advantage for TRPO and PPO in complex environments. The used policies can be used in a fully decentralized manner as the learned policy is not constrained by any robot parameters or communication protocols.
Keywords: ROS; Robotics; Deep Learning; ReinforcementrnLearning; Deep Q-Learning; Trust region Optimization;rnProximal Policy Optimization.
Angle Histogram of Hough Transform as Shape Signature for Visual Object Classification (AHOC)
by Aaron Rababaah
Abstract: This work presents a new method for object classification using Hough Transform (HT) and angle histogram as a signature of the target objects (AHOC). Several methods are reported in the literature that exploit the Hough Transform and other techniques as a pre-processing step to characterize objects to be used in object detection, recognition, classification etc. The HT is a very powerful technique to extract shape features from a 2D objects which has been used in many studies and implemented successfully in many applications. Our study is unique by post processing the HT voting space using a binary threshold then computing an angle histogram of the resulting angle space as a shape signature for the target object. Our image set consisted of 25 simple geometric shapes and six complex natural object classes of: trees, people, cars, airplanes, houses and horses. The method was trained and tested using 225 images from these six different classes and found to be robust with a classification accuracy of 95.83%.
Keywords: visual object characterization; object classification; Hough transform; angle histogram; template matching.
A lossless blind image data hiding scheme for semi-fragile image watermark
by Amine Khaldi
Abstract: In this work we propose a digital watermark approach that is invariable to rotation and guarantees the integrity of the inserted mark.The size of the resulting image is identical to the original image and the process guarantees acceptable transparency.For this we have experimented eight substitution processes to conclude that the substitution of four bits gives a great capacity of insertion while guaranteeing an acceptable transparency for the insertion of text. However for the insertion of an image it is recommended to substitute only one bit. This reduces the capacity but guarantees transparency of the watermarking process.
Keywords: Digital watermarking; Imperceptibility; Robustness; Digital Image; least significant bit substitution.
A Study on the Trustworthiness of Store Rating in Restaurant Recommendation O2O Service
by Hyung Su Kim, Sangwon Lee
Abstract: This study is a key piece of information provided by the recently spread O2O service and wants to confirm whether rating information on stores commonly marked as stars is truly reliable for consumers. We will examine whether store ratings of O2O service, which has been known to lower perceived search cost of consumers, are reliable system. First of all, we compared the ratings of the stores registered in a domestic restaurant recommendation O2O service with the actual survey results of the customers who visited the restaurant. As a result of the analysis, it was found that the rating of each store in the app was not correlated with the actual satisfaction of the store or the loyalties, and the result of the evaluation of the specific store attribute was also not significant in predicting the rating of the store in the O2O app. However, it was found that customers who visited the store by mobile app or recommendation of acquaintance were more loyal than customers who visited the store by simple internet search. Therefore, if O2O service establishes a more reliable customer rating system, consumer utilization of O2O service is expected to increase further.
Keywords: O2O; Store Rating; Customer Review; Customer Loyalty.
Deep Learning based Intelligent Surveillance Model for Detection of Anomalous Activities from Videos
by Karishma Pawar, Vahida Attar
Abstract: For safeguarding and monitoring purposes, public places are equipped with surveillance cameras. Timely and accurate identification of suspicious activities is paramount to securing the public places. Assigning human personnel to keep continuous watch over ongoing activities is error-prone and laborious. To alleviate the need of human personnel for monitoring such videos, automated surveillance systems are required. This paper proposes a deep learning based intelligent surveillance model for detection of anomalous activities. The problem of anomaly detection has been handled as one class classification problem. The proposed approach involves 2 dimensional convolutional auto-encoder for feature learning, sequence-to-sequence long short term memory model for learning temporal statistical correlation and radial basis function as activation function in fully connected network for one class classification. We experimented on real-world dataset by two variants of proposed approach and achieved significant results at frame-level anomaly detection.
Keywords: Anomaly detection; computer vision; convolutional autoencoder; deep learning; one class classification; radial basis function; video surveillance.
Cucumber disease detection using Adaptively Regularized Kernel-Based Fuzzy -Means and Probabilistic Neural Network
by Jayanthi M.G, Dandinashivara Revanna Shashikumar
Abstract: Agriculture has been now considered much more than just feeding the ever-growing population of the world. For many decades, computers have been used to provide automatic solutions instead of a manual diagnosis of plant diseases which is costly and error prone. Cucumber, a common economic crop, is one of the most popular vegetables in agricultural field, and occupies a large proportion of vegetable cultivation in our daily lives. So in this paper, recognizing the cucumber disease is utilized. At first, the cucumber diseases were segmented using Adaptively Regularized Kernel-Based Fuzzy𝐶-Means (ARKFCM). Once the disease is segmented, the color feature is extracted based on Hue, Saturation and value (HSV) based semantic technique and texture feature is extracted based on Gray level co-occurrence matrix (GLCM) technique. Then the cucumber disease is classified using a Probabilistic Neural Network (PNN). Finally, the experimentation is done on standard agricultural database and implemented in Matlab. For recognizing cucumber disease such as anthracnose, downy mildew and gray mold, the experimental results show that the proposed method is feasible and effective when conducted on a database of cucumber diseased leaf images.
Keywords: Cucumber disease; Segmentation; Feature extraction; Classification; ARKFCM; PNN; Sensitivity; Specificity and Accuracy.
Challenges for Computer Aided Diagnostics using X-Ray and Tomographic Reconstruction Images in craniofacial applications
by Abhishek Gupta
Abstract: Computer-aided diagnostic systems are very important and crucial for patients diagnosis and treatment planning. To automate such systems requires the combination of various different steps involved in the system. The variation in the process may make the system failure. Therefore, it is a need to analyze the systems workflow and to work on the challenges embedded in it. Imaging and automation challenges regarding computer-aided diagnosis are discussed in this paper. Each challenge is introduced within a Computer Aided Diagnosis (CAD) system of craniofacial applications. The significance and importance of every challenge are described in the paper. The method to overcome the challenges and issues are also discussed as an advantage for the readers of the paper.
Keywords: Computer tomography; cephalometry; landmark; computer aided diagnosis; diagnosis; X-Ray; Tomographic reconstruction; craniofacial; medical images; dentistry.
Compact reconfigurable triple notch ultra-wideband bandpass filter for cognitive radio system
by Janardan Sahay, Sanjay Kumar
Abstract: This paper introduces a compact ultra-wideband (UWB) bandpass filter (BPF) with two switchable external notch structures which avoids the interference between the UWB communication with worldwide interoperability for microwave access (WiMAX) and wireless local area network (WLAN) systems, suitable for cognitive radio (CR) applications. The first external notch structure is modified C-shaped structure which generates a sharp rejection at 3.5 GHz to avoid interference with WiMAX system and second external notch structure is back to back T-shaped structure which creates sharp rejection notch bands at 5.2 GHz and 5.8 GHz frequencies for WLAN systems. The filter reconfiguration is achieved by varying the conductance of microstrip line. The response of the proposed structure shows sharp rejections at one WiMAX and two WLAN bands. The bandpass filter covers frequency of UWB system from 3.1 to 10.7 GHz having very low passband insertion loss. The proposed UWB BPF is designed, simulated, fabricated and tested to validate the results. A good agreement is achieved between simulation and experimental results.
Keywords: cognitive radio; notch band; reconfigurable filter; ultra-wideband; bandpass filter; interference; WLAN; WiMAX.
Special Issue on: MIWAI 2017 Computational Intelligence and Deep Learning for Computer Vision
A Real-time Aggressive Human Behavior Detection System in Cage Environment across Multiple Cameras
by Phooi Yee Lau, Hock Woon Hon, Zulaikha Kadim, Kim Meng Liang
Abstract: The sense of confinement inherent in a cage environment, such as lock-up or elevator, will become a place that is conducive to conduct criminal activities such as fighting. The monitoring of activities in the enclosed cage environments has, therefore, become a necessity. However, placing security guards could be inefficient and ineffective, as 24/7 surveillance is impossible to monitor the scene 24 by 7. A vision-based system, employing a real-time video analysis technology, could be deployed to detect abnormalities such as aggressive behavior, could eventually become an emerging and challenging problems. In order to monitor suspicious activities in a cage environment, the system should be able (1) to track individuals, (2) to identify their action, and (3) to keep a record of how often these aggressive behavior happen, at the scene. On top of that, the system should be implemented in real-time, whereby, the following limitations should be taken into consideration: (1) viewing angle (fish-eye) (2) low resolution (3) number of people (4) low lighting (normal) and (5) number of cameras. This paper proposes to develop a vision-based system that is able to monitor aggressive activities of individuals in an enclosed cage environment using multiple cameras. This work focuses on analyzing the temporal feature of aggressive movement, taking into consideration the limitations discussed previously. Experimental results show that the proposed system is easily realized and achieved impressive real-time performance, even on low end computers.
Keywords: surveillance system; behavior monitoring; perspective correction; background subtraction; real-time video processing.
Attention-Based Argumentation Mining
by Derwin Suhartono, Aryo Pradipta Gema, Suhendro Winton, Theodorus David, Mohamad Ivan Fanany, Aniati Murni Arymurthy
Abstract: This paper is intended to make a breakthrough in argumentation mining field. Current trends in argumentation mining research use handcrafted features and traditional machine learning (e.g., support vector machine). We worked on two tasks: identifying argument components and recognising insufficiently supported arguments. We utilise deep learning approach and implement attention mechanism on top of it to gain the best result. We do also implement Hierarchical Attention Network (HAN) in this task. HAN is a neural network that gives attention to two levels, which are word-level and sentencelevel. Deep learning with attention mechanism models can achieve better result
compared with other deep learning methods. This paper also proves that on research task with hierarchically-structured data, HAN will perform remarkably well. We do present our result on using XGBoost instead of a regular non-ensemble classifier as well.
Keywords: argumentation mining; hand-crafted features; deep learning; attention mechanism; hierarchical attention network; word-level; XGBoost; sentence-level.
SEGMENTATION AND RECOGNITION OF CHARACTERS ON TULU PALM LEAF MANUSCRIPTS
by Antony P.J., Savitha C.K.
Abstract: This paper proposes an efficient method for segmentation and recognition of handwritten characters from Tulu palm leaf manuscript images. The proposed method uses an automated tool with a combination of thresholding and edge detection technique to binarize the image. Further projection profile with connected component analysis is used to line and character segmentation. Deep convolution neural network (DCNN) model used here to extract features and recognize segmented Tulu characters efficiently with a recognition rate of 79.92 %. The results are verified using benchmark dataset, the AMADI_LontarSet to generalize our model to handwritten character recognition task. The results showed that our method outperforms from the existing state of art models.
Keywords: Handwritten Character Recognition; Palm Leaf; Segmentation; DCNN; Tulu.
Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media
by Tho Quan
Abstract: Sentiment analysis has been emerging recently as one of major Natural Language Processing (NLP) tasks in many applications. Especially, as social media channels (e.g. social networks or forums) have become signicant sources for brands to observe users opinions about their products, this task is thus increasingly crucial. However, when applied with real data obtained from social media, we notice that there is a high volume of short and informal messages posted by users on those channels. This kind of data makes the existing works suer from much diculty to handle, especially ones using deep learning approaches.rnrnIn this paper, we propose an approach to handle this problem. This work is extended from our previous work, in which we proposed to combine the typical deep learning technique of Convolutional Neural Network (CNN) with domain knowledge. The combination is used forrnacquiring additional training data augmentation and more reasonable loss function. In this work, we further improve our architecturernby various substantial enhancements, including negation-based data augmentation,rntransfer learning for word embeddings, combination of word-level embeddings andrncharacter-level embeddings, and using multi-task learning technique for attachingrndomain knowledge rules in the learning process. Those enhancements, specicallyrnaiming to handle short and informal message, help us to enjoy signicant improve-rnment on performance once experimenting on real datasets.
Keywords: Sentiment analysis; deep learning; domain knowledge; recurrent neural network;transfer learning; multi-task learning; data augmentation.