International Journal of Computational Vision and Robotics (32 papers in press)
Real time sign language recognition using depth sensor
by Jayesh Gangrade, Jyoti Bharti
Abstract: Communication via gestures is a visual dialect utilized by deaf and Hard-of-Hearing (HoH) people group. This paper proposed a system for sign language recognition utilizing human skeleton data provided from Microsofts Kinect sensor to recognizing sign gestures. The Kinect sensor generates the skeleton of a human body and distinguishes 20 joints in it. The proposed method utilizes 11 out of 20 joints and extracts 35 novel features per frame, based on distances, angles and velocity involving upper body joints. Multi-class Support Vector Machine classified the 35 Indian sign gestures in real time with accuracy of 87.6%. The proposed method is robust in cluttered environment and viewpoint variation.
Keywords: Kinect sensor; Indian sign gesture; Multi class support vector machine; Human computer interaction; Pattern recognition.
Crypto-compression Scheme based on the DWT for Medical Image Security
by Med Karim Abdmouleh
Abstract: Ensuring the confidentiality of exchanged data is always a great concern for any communication. Also, the purpose of compression is to reduce the amount of data while preserving important information. This reduction leads to the archiving of more information on the same storage medium and minimizes the transfer times via telecommunication networks. Indeed, the combination of encryption and compression guarantees both confidentiality and authentication of information. In addition, it reduces processing time and transmission on public channels and increases storage capacity. In this paper, we propose a new approach of a partial or selective encryption for medical Images based on the Discrete Wavelet Transform (DWT) coefficients and compatible with the norm JPEG2000. The obtain results prove that, the proposed scheme provides a significant reduction of the processing time during the encryption and decryption, without tampering the high compression rate of the compression algorithm.
Keywords: Crypto-compression; Encryption; Compression; Discrete Wavelet Transform; RSA; JPEG2000; Telemedicine.
Non-Invasive Technique of Diabetes Detection using Iris Images.
by Kesari Verma, Bikesh Kumar Singh, Neelam Agrawal
Abstract: Alternative medicine techniques are important in improving the quality of life, disease prevention and better to the conventional invasive method of diseases detection. This paper addresses a non-invasive approach of diabetic detection using iris images. The proposed technique evaluate the use of iridology to diagnose diabetes using modern digital image processing techniques that analyses structural properties of the iris and classifies the patterns accordingly. The system analyses the broken tissues of the iris by extracting significant textural features using Gabor filter bank and Gray Level Co-occurrence Matrix (GLCM) from the subsection of the iris. The extracted textural features help to categorize the diabetic and non-diabetic irises using benchmarks Artificial Neural Network (ANN) and Support Vector Machine (SVM) classifiers. The promising results of extensive experiments demonstrate the effectiveness of the proposed method.
Keywords: diabetes detection; image processing; iris images; support vector machine; artificial neural network; SVM; ANN; gabor features; gray level co-occurrence matrix; GLCM; Non-Invasive Technique.
Autonomous Void Detection and Characterisation in Point Clouds and Triangular Meshes
by Benjamin Bird, Barry Lennox, Simon Watson, Thomas Wright
Abstract: In this paper we propose and demonstrate a novel void characterisation algorithm which is able to distinguish between internal and external voids that are present in point clouds of both manifold and non-manifold objects and 3D scenes. We demonstrate the capabilities of our algorithm using several point clouds representing both scenes and objects. Our algorithm is shown in both a descriptive overview format as well as pseudocode. We also compare a variety of different void detection algorithms and then present a novel refinement to the best performing of these algorithms. Our refinement allows for voids in point clouds to be detected more efficiently, with fewer false positives and with over an order of magnitude improvement in terms of run time. We show our run time performance and compare it to results obtained using alternative algorithms, when tested using popular single board computers. This comparison is important as our work is intended for online robotics applications, where hardware is typically of low computational power. The target application for this work is 3D scene reconstruction to aid in the decommissioning of nuclear facilities.
Keywords: Point Cloud; Void Detection; Meshing; Reconstruction; Computer
GCSAC: Geometrical Constraint SAmple Consensus for Primitive Shapes Estimation in 3-D Point Cloud
by Le Van Hung, Hai Vu, Thi-Thuy Nguyen, Thi-Lan Le, Thanh-Hai Tran
Abstract: Estimating parameters of a primitive shape from a 3-D point cloud data is a challenging problem due to data containing noises and computational time demand. In this paper, we present a new robust estimator (named GCSAC, Geometrical Constraint SAmple Consensus) aimed at solving such issues. The proposed algorithm takes into account geometrical constraints to construct qualified samples for the estimation. Instead of randomly drawing minimal subset of sample, explicit geometrical properties of the interested primitive shapes (e.g., cylinder, sphere and cone) are used to drive sampling procedures. At each iteration of GCSAC, the minimal subset sample is selected based on two criteria (1) It must ensure a consistency with the estimated model via a roughly inlier ratio evaluation; (2) The samples satisfy geometrical constraints of the interested objects. Based on the obtained good samples, model estimation and verification procedures of the robust estimator are deployed in GCSAC. Extensive experiments have been conducted on synthesized and real datasets for evaluation. Comparing with the common robust estimators of RANSAC family (RANSAC, PROSAC, MLESAC, MSAC, LO-RANSAC and NAPSAC), GCSAC outperforms in term of both the precision of the estimated model and computational time. The implementations of the proposed method and the datasets are made publicly available.
Keywords: Robust Estimator; Primitive Shape Estimation; RANSAC and RANSAC Variations; Quality of Samples; Point Cloud data.
Exploring the Effects of Non-Local Blocks on Video Captioning Networks
by Jaeyoung Lee, Junmo Kim
Abstract: In addition to visual features, video also contains temporal information that contributes to semantic meaning regarding the relationships between objects and scenes. There have been many attempts to describe spatial and temporal relationships in video, but simple encoder-decoder models are not sufficient for capturing detailed relationships in video clips. A video clip often consists of several shots that seem to be unrelated, and simple recurrent models suffer from these changes in shots. In other fields, including visual question answering and action recognition, researchers began to have interests in describing visual relations between the objects. In this paper, we introduce a video captioning method to capture temporal relationships with a non-local block and boundary-aware system. We evaluate our approach on a Microsoft Video Description Corpus (MSVD, YouTube2Text) dataset and a Microsoft Research-Video to Text (MSR-VTT) dataset. The experimental results show that a non-local block applied along a temporal axis can improve video captioning performance on video captioning datasets.
Keywords: Video captioning; Non-local mean; Self-attention; Video description.
A Novel Approach for Mitigating Atmospheric Turbulence using Weighted Average Sobolev Gradient and Laplacian
by Prifiyia Nunes, Dippal Israni, Karthick D, Arpita Shah
Abstract: Heat scintillation mainly leads to atmospheric turbulence which causes the image distortion due to the propagation of light through the volatile environment. The change in the refractive index due to variation in wind velocity is also the reason for causing turbulence in the atmosphere. Traditional image registration approach lags as it is computationally expensive and need post-processing algorithm for sharpening the image. A non-registration-based Sobolev gradient and Laplacian (SGL) algorithm removes turbulence but results in ghost artifacts in moving objects. This paper proposes a novel approach based on weighted average SGL. The proposed method mitigates atmospheric turbulence as well as restores the moving object in the scene. Performance metrics like SSIM and MSE prove that the proposed algorithm outperforms the state of the art algorithms in terms of restoring the geometric distortion as well as the object of interest.
Keywords: atmospheric turbulence; heat scintillation; restoration; Sobolev; phase shift; weighted average.
3D Object Classification Based on Deep Belief Networks and Point Clouds
by Fatima Zahra OUADIAY, Nabila ZRIRA, Mohamed HANNAT, El Houssine BOUYAKHF, Mohammed Majid HIMMI
Abstract: Since the discovery of 3D sensors such as Kinect camera, 3D object models, and point clouds become frequently used in many areas. The most important one is the 3D object recognition and classification in robotic applications. This type of sensors, like the human vision, allows generating the object model from a field of view or even a complete 3D object model by combining several individual Kinect frames. In this work, we propose a new feature learning-based object classification approach using Point Cloud Library (PCL) detectors and descriptors and Deep Belief Networks (DBNs). Before developing the classification approach, we evaluate 3D descriptors by proposing a new pipeline that uses the L2-distance and the recognition threshold. 3D descriptors are computed on different datasets, in order to achieve the best descriptors. Subsequently, these descriptors are used to learn robust features in the classification approach using DBNs. We evaluate the performance of these contributions on two datasets; Washington RGB-D and our real 3D object datasets. The results show that the proposed approach outperforms advanced methods by approximately 5% in terms of accuracy.
Keywords: Kinect; 3D Object classification; PCL; recognition threshold; DBNs; Washington RGB-D.
A novel fast fractal image compression based on reinforcement learning
by Bejoy Varghese, Krishnakumar S
Abstract: The concept of digital image compression is of considerable interest in the area of transmission and storage of images. The recent research in this area explores the combination of different coding techniques to achieve a better compression ratio without compromising the image quality. Fractal-based coding techniques got the attention of the research community from the very earlier days of data compression. However, those methods are computationally intensive at that time because of the exhaustive search involved to select a transformation sequence. In this paper, we propose a system that replaces the current domain-range comparison in the fractal compression with a reinforcement learning technique that reduces the compression time and increases the PSNR. The system will learn from the output of the exhaustive algorithm in the initial state and discard the combinatorial search after trained on a data set. The recommended method shows a good improvement in the compression ratio, PSNR and compression time.
Keywords: Machine learning; Image compression; Reinforcement learning; Fractal coding.
Video Summarization based on Motion Estimation using Speeded up Robust Features
by Dipti Jadhav, Udhav Bhosle
Abstract: Video Summarization (VS) is a technique to extract keyframes from a video based on video contents. It provides user with a brief representation of video contents to semantically understand the video. This paper aims to present video summarization based on motion between consecutive video frames. The motion between frames is represented by affine and homograph transformation. The video frames are represented by a set of Speeded Up Robust Features (SURF). The keyframes are extracted in a sequential manner by successively comparison with the previously declared keyframe based on motion. The validity of the proposed algorithms is demonstrated on videos from Internet, YouTube dataset and Open Video Project. The proposed work is evaluated by comparing it with different classical and state-of-the-art video summarization methods reported in the literature. The experimental results and performance analysis validates the effectiveness and efficiency of the proposed algorithms.
Keywords: Video Summarization; motion estimation. Key frames; SURF; affine transformation; homography.
Electroencephalography based classification of human emotion: A hybrid strategy in machine learning paradigm
by Bikesh Kumar Singh, Ankur Khare
Abstract: The objective if this article is to develop a new improved two stage method for classifying emotional states of human by fusing back-propagation artificial neural network (BPANN) and k-nearest neighbors (K-NN). A publicly available electroencephalograph (EEG) signal database for emotion analysis using physiological signals is used in experiments. The EEG signals are initially pre-processed followed by feature extraction in time domain and frequency domain. The extracted features were then supplied to proposed model for emotion recognition. The proposed machine learning framework attains higher classification accuracy of 78.33 % as compared to conventional BPANN and K-NN classifiers, which achieves classification accuracy of 56.90 % and 59.52 % respectively. Future work is required to evaluate the proposed model in practical scenario wherein a proficient psychologist or medical professional can analyze the emotion recognized by first stage and the unsure test cases can be supplied to secondary classifier (k-NN) for further assessment.
Keywords: Brain computer interface; emotion; electroencephalogram; hybrid classifier.
Blocking of Operation of Unauthorized Software using MQTT
by Kitae Hwang
Abstract: This paper presents design and implementation of the Meerkat system; a system that detects operation of software that is unauthorized. The MQTT protocol has been used for data communication in Meerkat system. The Meerkat system is largely comprised of three components: Meerkat client, the web application that operates as admin, and the server software. The Meerkat client alerts the MQTT broker as soon as it detects operation of unauthorized software on the users PC. The admin receives the information from the MQTT broker immediately via the MQTT broker. To evaluate the performance of the system, the transmission time between messages delivered from the user/admin PC was measured. The measurements illustrated that it took, on average, 8~50 milliseconds for a message to be delivered. These results indicate that the messages are delivered quickly enough for the Meerkat system to be put into actual use.
Keywords: MQTT; publish-subscribe; unauthorized software; Mosquitto.
Development of Translation Telephone System by Using MQTT Protocol
by Jae Moon Lee
Abstract: This paper is a study on the development of a translation telephone system that enables two individuals speaking different languages to communicate through phone call. The system will be developed using the MQTT protocol - a push service technology - and voice, translation related web services that have been experiencing rapid improvement along with the development in the artificial intelligence technology. The core technology applied to the system are voice recognition, text translation, and speech synthesis. In order to guarantee that the system runs real time, the system is designed to utilize as many threads as possible so that the functions can be operated simultaneously. In order to minimize communication traffic, the system is designed to convert a conversation into text and send out a translated text to the counterpart instead of sending voice data. Also, to ensure the accuracy of the translation, the system is designed to translate all given information in a sentence basis separately. The proposed system has been developed to operate in Android smartphones. Because sentences do not tend to be too long during a conversation normally, we can know experimentally that the developed translation telephone system appears to run real time.
Keywords: Telephone System; Translation; Web Service; Push Service; Speech Recognizer; Speech Synthesizer; MQTT.
An Iris Biometric-based Dual Encryption Technique for Medical Image in e-Healthcare Application
by Aparna P., P. V.V. Kishore
Abstract: Medical image watermarking has been broadly distinguished as a relevant technique for rising data content verification, security, image fidelity, and authenticity in the current e-health environment where medical images are stored, retrieved and transmitted over networks. Maintaining a secure environment for Tele-radiology from different issues such as malpractice liability and Image Retention etc is a challenging task. To discover the security issues, we suggest biometric key based medical image watermarking technique in E-health care application in this paper. In this paper, two types of inputs are utilized such as patient MRI image and electronic health record (HER). Initially, we segment the ROI region and encrypt the information using SHA-256 algorithm. Then, we encrypt the EHR Information using ECC algorithm. In this ECC algorithm, for key generation we utilize iris biometric which is increase the security level of watermarking system. Then, we concatenate the image and EHR information. Further increase the system security, we use arithmetic encoding algorithm to compress the bit stream. Then finally, we embed the bit-stream into cover image. The same process is repeated for the extraction process. The experimental result is carried out on the different medical images with EHR and the effectiveness of the proposed algorithm is analyzed with the help of Peak signal to noise ratio (PSNR) and normalized correlation (NC). The proposed methodology is used for many applications concerned with privacy protection, safety and management.
Keywords: SHA-256; elliptical curve cryptography; biometric key; watermarking; Authentication; iris image; arithmetic encoding.
2D-Feature descriptor without orientation Compensation
by Manel Benaissa, Abdelhak Bennia
Abstract: Several feature descriptors have been proposed in the literature with a variety of definitions and a common goal, describe and get the best possible match between potentially interesting points in two images. In this paper, we proposed a new orientation invariant feature descriptor without an additional step dedicated to this task. We exploited the information provided by two representations of the image (intensity and gradient) for a better understanding and representation of the feature point and its surroundings. The information provided is summarized in two cumulative histograms and used in the description and matching process of the feature points. The experimental results show its robustness in the face of multiple image changes.
Keywords: Feature understanding; Feature description; Feature matching; object detection.
New Color Fusion Deep Learning Model for Large-Scale Action Recognition
by Abhishek Verma
Abstract: In this work we propose a fusion methodology that takes advantage of
multiple deep convolutional neural network (CNN) models and two color spaces
RGB and oRGB to improve action recognition performance on still images. We
trained our deep CNNs on both the RGB and oRGB color spaces, extracted and
fused all the features, and forwarded them to an SVM for classification. We
evaluated our proposed fusion models on the Stanford 40 Action dataset and the
People Playing Musical Instruments (PPMI) dataset using two metrics: overall
accuracy and mean average precision (mAP). Our results prove to outperform the
current state-of-the-arts with 84.24% accuracy and 83.25%mAP on Stanford 40
and 65.94% accuracy and 65.85% mAP on PPMI. Furthermore, we also evaluated
the individual class performance on both datasets. The mAP for top 20 individual
classes on Stanford 40 lies between 97% and 87%, on PPMI the individual mAP
class performance lies between 87% and 34%.
Keywords: deep convolutional neural networks; deep learning; fusion model; action recognition; VGGNet; GoogLeNet; ResNet;.
Discrete Texture Elements Synthesis on Surfaces using Elements Distribution
by Yan Gui, Yang Liu
Abstract: In this paper, we present a novel method with a special focus on reproduction of distribution of texture elements over arbitrary 3D surfaces. To this end, this paper proposes neighbourhood comparisons to find the best matching neighbourhood for local growth over 3D surfaces, when taking the 2D connectivity constructed from the input sample texture used as a reference. The synthesized distribution provides the positions information of texture elements on the surface. Then, we perform the paste operations by using local parameterization to obtain the final textured results. Experimental results show that the proposed method is successful in generating elements distribution over 3D surfaces. Moreover, our method especially works well for the textures with discrete texture elements, which maintains the integrity of the synthesized texture elements over 3D surfaces.
Keywords: surface texture synthesis; discrete texture elements; elements distribution; neighbourhood comparison; local parameterization.
Computational Linguistic Retrieval Framework using Negative Bootstrapping for Retrieving Transliteration Variants
by Shashi Shekhar, Dilip Sharma, M.M. Sufyan Beg
Abstract: In Natural Language Processing one of the imperative and relatively less mature area is a transliteration. During transliteration, issues like language identification, script specification, missing sounds arise in mixed script queries (native script and non-native script). To overwhelm these issues we propose a new technique called negative bootstrapping with frequent matrix Apriori for transliteration. Roman script is widely used in web search query for searching contents. The Major challenge that the system face to process transliterated word is because of its existence in more than one form due to the possibility of writing that word in different spelling variation. The proposed methodology subsists of case conversion, bi-level feature extraction, n-gram text categorization and negative bootstrapping with frequent matrix Apriori in training stage and in testing stage. The experimental evaluation has been done to check transliteration accuracy along with language identification to which the word belongs against established methods on the benchmark dataset of Microsoft Research for Mixed Script Information Retrieval, FIRE(Forum for Information Retrieval Evaluation). The paper offers a high-principled answer to handle multiple scripts used in a document leading to the problems of term matching and committing variations in spelling while searching the contents. The problem is modeled collectively with the deep-learning design. We tend to gift an in-depth empirical analysis of the proposed methodology against standard approaches for transliteration. The proposed method achieves significantly better results in terms of MRR, MAP and accuracy when applied on n-Gram approach on the benchmark dataset.
Keywords: Feature extraction; text categorization; negative bootstrapping; Apriori; transliteration; mulltilexical matching; Substitution; Variations; word normalization; NLP; Machine Learning.
Two Stage Optimized Video Summary.
by Dipti Jadhav, Udhav Bhosle, Jyoti Deshmukh
Abstract: Video Summarization (VS) is a technique to extract keyframes from a video based on video contents. This paper presents a two stage video summarization. The first stage video summary is generated based on optimization of SURF keypoints using modified PreARM algorithm. The video summary is generated based on the number of matched optimized keypoints between the consecutive video frames. The second stage video summary aims at reducing the redundancy of the generated video summary using Multi-Objective Genetic Algorithm. The experimental results on videos from Internet, YouTube dataset and Open Video Project demonstrate validity of the proposed work. The performance analysis validates the effectiveness and efficiency of the proposed algorithms.
Keywords: Video Summarization; SURF Keypoint Optimization; Multi-Objective Genetic Algorithm.
Personal Authentication and Risk Evaluation by Sensible Keyboard Sound
by Hyo-Joong Suh, Hoyoung Hwang
Abstract: Personal authentication is an essential process in various network applications such as online shopping, mobile banking, e-Government services, etc. The authentication processes normally use personal or device information which include user password, IP address, device number, MAC address, and bio information such as finger prints or iris scan. In addition to the direct personal or device information, environmental information can be used for authentication. For example, the location and time of the device use may be considered as important factors to evaluate the risk of false use. The authentication technique is getting more important as the volume of online business and the chances of non-face-to-face contacts enormously increase. This paper proposes a personal authentication technique using sensible sound of keyboard input device. We can extract the unique features of the keyboard stroke sound, and use the features for personal authentication and also for risk evaluation to protect from fraud.
Keywords: personal authentication; keyboard; sound; hijacking; risk evaluation.
An IoT Based Smart Parking Management System
by Inhwan Jung
Abstract: In this paper, we implemented smart parking information management system based on IoT using ultrasonic parking sensor and Bluetooth beacon. Ultrasonic IoT sensors used for parking sensing are controlled by the Arduino board to collect parking sensing data, and the collected parking information is transmitted to the server in real time using the MQTT protocol. The server stores the parking information in the database and provides the vehicle driver with real-time parking status using the MQTT protocol. The driver starts the smartphone app in the parking lot and enters the parking lot. The smartphone automatically recognizes which parking space using the Bluetooth beacon signal at the entrance of the parking lot and can confirm the parking information of the parking space to be parked at a glance by communicating with the server. The parking management system implemented in this study not only helps the driver to park the car but also uses the real - time parking information stored in the database to obtain marketing information such as hourly, daily or monthly number of visiting customers and average shopping time.
Keywords: IoT; Smart Parking; Bluetooth Beacon; Parking App; MQTT.
Multi-document Summarization Using Feature Distribution Analysis
by Jae-Young Chang
Abstract: Recently, opinion documents have been growing rapidly in an environment where anyone can express an opinion on the Internet or SNS. This situation requires an automatic summarization technique in order to understand the contents of large-scale opinion documents. However, it is not easy to summarize the opinion documents with previous text summarization technologies since the opinion documents include subject expressions, as well as features of targets objects. In this paper, a method to identify and extract the representative documents with a large amount of opinion documents is proposed. In addition, experiments show that the proposed method successfully extracts representative opinion documents.
Keywords: Multi-document Summarization; Text Mining; Opinion Mining; Feature; Social Network Servic.
Situation-Cognitive Traffic Light Control Based On Object Detection Using YOLO Algorithm
by Sun-Dong Kim
Abstract: Current traffic lights provide the green signal with fixed-time interval without considering the traffic situation. As a result, cars in a long line have to wait long time, which causes traffic jams and makes the drivers be irritated. In order to solve the problem, it is necessary to control the green signal interval according to the analyzed traffic volume using the image processing and the machine learning techniques. This paper presents a situation-cognitive traffic light control algorithm that measures the traffic volume using object detection algorithm called YOLO (You Only Look Once) and controls the traffic signal intervals according to the traffic volume. The algorithm expects the smooth traffic flow and the reduction of the drivers stress.
Keywords: YOLO; You Only Look Once; Object Detection; Situation-Cognitive; Traffic Light Control.
3D image reconstruction from different image formats using marching cubes technique
by Abdou Shalaby, Mohammed Elmogy, Ahmed Abo Elfetouh
Abstract: Structure from motion (SFM) is the problem of reconstructing the 3D image from 2D images. The main problem of 3D reconstruction is the quality of the 3D image that depends on the number of 2D slices input to the system. A large number of 2D slices may lead to high processing time. This paper introduces a new model to reconstruct the 3D image from any 2D image by using marching cubes algorithm. We use the LabVIEW program to build the system and use the Biomedical Toolkit to read and registered any 2D images. Our main goal is to implement the 3D reconstruction system to produce a high-quality 3D image with a minimum number of 2D slices and to decrease the execution time as possible. We apply our system on two datasets; all the experimental results have proved the efficiency and effectiveness of this system in 3D image reconstruction from any 2D image type. As shown in results, changing iso_value, image type and a number of images, affects the quality of 3D image reconstruction, and the processing time.
Keywords: 3D image reconstruction; marching cubes; LabVIEW; 2D image registration; computed tomography; CT; magnetic resonance; MR; single-photon emission computed tomography; SPECT.
Effective scene change detection in complex environments
by Hui Fuang Ng, Chee Yang Chin
Abstract: One of the fundamental operations in computer vision applications is change detection, in which moving foreground objects are segmented from a static background. A common approach for change detection is the comparison of an image frame with the stored background model using a matching algorithm, a process known as background subtraction. However, such techniques fail in environments with dynamic backgrounds, illumination changes, or shadow and camera jitters. This study focuses on effectively detecting scene changes in complex environments. To this end, we proposed a new colour descriptor named local colour difference pattern (LCDP) that is insusceptible to shadow and is able to capture both colour and texture features at a pixel location. Furthermore, a scene change detection framework was proposed to handle dynamic scenes based on sample consensus that integrates LCDP and a novel spatial model fusion mechanism. Experiments using the CDnet benchmark dataset demonstrated the effectiveness of the proposed approach to change detection in complex environments.
Keywords: change detection; background subtraction; moving object segmentation; foreground segmentation; local descriptor; video signal processing; CDnet.
Special Issue on: Research in Virtual Reality
Crowd detection and counting using a static and dynamic platform: state of the art
by Huma Chaudhry, Mohd Shafry Mohd Rahim, Tanzila Saba, Amjad Rehman
Abstract: Automated object detection and crowd density estimation are popular and important area in visual surveillance research. The last decades witnessed many significant research in this field however, it is still a challenging problem for automatic visual surveillance. The ever increase in research of the field of crowd dynamics and crowd motion necessitates a detailed and updated survey of different techniques and trends in this field. This paper presents a survey on crowd detection and crowd density estimation from moving platform and surveys the different methods employed for this purpose. This review category and delineates several detections and counting estimation methods that have been applied for the examination of scenes from static and moving platforms.
Keywords: crowd; counting; holistic and local motion features; estimation; visual surveillance; moving platform; computer vision.
Real time vision-based hand gesture recognition using depth sensor and a stochastic context free grammar
by Jayesh Gangrade, Jyoti Bharti
Abstract: This paper presents a new algorithm in computer vision for the recognition of hand gestures. In the proposed system, Kinect sensor is used to track and segment hand in the clutter background and feature extracted by finger and an angle between them. Classify the hand posture using multi-class support vector machine. The hand gesture is recognised by stochastic context free grammar (SCFG). Stochastic context free grammar uses syntactic structure analysis and by this method, recognises hand gestures by set of production rules which consists of a combination of hand postures. The proposed algorithm is able to recognise various hand postures in real time with more than 97% accuracy.
Keywords: hand gesture; stochastic context free grammar; SCFG; multi-class support vector machine; Kinect sensor.
Adaptive multi-threshold based de-noising filter for medical image applications
by A. Ramya, D. Murugan, G. Murugeswari, Nisha Joseph
Abstract: Medical image processing is the emerging research area and many researchers contributed to medical image processing by proposing new techniques for medical image enhancement and abnormality detection. Interpretation of medical images is a challenging problem because of the unavoidable noise produced by the medical imaging devices and interference. In this work, a new framework is proposed for noise detection and reduction. This framework comprises two phases. First phase is the noise detection phase which is performed using the newly proposed adaptive multi-threshold scheme (AMT). In second phase, modification of noisy pixel is done using edge preserving median filter (EPM), which conserves the edge component and controls the blurring effect with preservation of fine details of interior region. The proposed work is tested with benchmark images and few medical images. It produces promising result and the results are compared with existing two-stage noise reduction techniques. Popular performance metrics such PSNR and SSIM are used for evaluation. Quantitative analysis and experimental results demonstrate that the proposed method is more efficient and suitable for medical image pre-processing.
Keywords: noise removal; noise detection; impulse noise; multi-threshold; edge preserving.
Special Issue on: MIWAI 2017 Computational Intelligence and Deep Learning for Computer Vision
A Real-time Aggressive Human Behavior Detection System in Cage Environment across Multiple Cameras
by Phooi Yee Lau, Hock Woon Hon, Zulaikha Kadim, Kim Meng Liang
Abstract: The sense of confinement inherent in a cage environment, such as lock-up or elevator, will become a place that is conducive to conduct criminal activities such as fighting. The monitoring of activities in the enclosed cage environments has, therefore, become a necessity. However, placing security guards could be inefficient and ineffective, as 24/7 surveillance is impossible to monitor the scene 24 by 7. A vision-based system, employing a real-time video analysis technology, could be deployed to detect abnormalities such as aggressive behavior, could eventually become an emerging and challenging problems. In order to monitor suspicious activities in a cage environment, the system should be able (1) to track individuals, (2) to identify their action, and (3) to keep a record of how often these aggressive behavior happen, at the scene. On top of that, the system should be implemented in real-time, whereby, the following limitations should be taken into consideration: (1) viewing angle (fish-eye) (2) low resolution (3) number of people (4) low lighting (normal) and (5) number of cameras. This paper proposes to develop a vision-based system that is able to monitor aggressive activities of individuals in an enclosed cage environment using multiple cameras. This work focuses on analyzing the temporal feature of aggressive movement, taking into consideration the limitations discussed previously. Experimental results show that the proposed system is easily realized and achieved impressive real-time performance, even on low end computers.
Keywords: surveillance system; behavior monitoring; perspective correction; background subtraction; real-time video processing.
Attention-Based Argumentation Mining
by Derwin Suhartono, Aryo Pradipta Gema, Suhendro Winton, Theodorus David, Mohamad Ivan Fanany, Aniati Murni Arymurthy
Abstract: This paper is intended to make a breakthrough in argumentation mining field. Current trends in argumentation mining research use handcrafted features and traditional machine learning (e.g., support vector machine). We worked on two tasks: identifying argument components and recognising insufficiently supported arguments. We utilise deep learning approach and implement attention mechanism on top of it to gain the best result. We do also implement Hierarchical Attention Network (HAN) in this task. HAN is a neural network that gives attention to two levels, which are word-level and sentencelevel. Deep learning with attention mechanism models can achieve better result
compared with other deep learning methods. This paper also proves that on research task with hierarchically-structured data, HAN will perform remarkably well. We do present our result on using XGBoost instead of a regular non-ensemble classifier as well.
Keywords: argumentation mining; hand-crafted features; deep learning; attention mechanism; hierarchical attention network; word-level; XGBoost; sentence-level.
SEGMENTATION AND RECOGNITION OF CHARACTERS ON TULU PALM LEAF MANUSCRIPTS
by Antony P.J., Savitha C.K.
Abstract: This paper proposes an efficient method for segmentation and recognition of handwritten characters from Tulu palm leaf manuscript images. The proposed method uses an automated tool with a combination of thresholding and edge detection technique to binarize the image. Further projection profile with connected component analysis is used to line and character segmentation. Deep convolution neural network (DCNN) model used here to extract features and recognize segmented Tulu characters efficiently with a recognition rate of 79.92 %. The results are verified using benchmark dataset, the AMADI_LontarSet to generalize our model to handwritten character recognition task. The results showed that our method outperforms from the existing state of art models.
Keywords: Handwritten Character Recognition; Palm Leaf; Segmentation; DCNN; Tulu.
Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media
by Tho Quan
Abstract: Sentiment analysis has been emerging recently as one of major Natural Language Processing (NLP) tasks in many applications. Especially, as social media channels (e.g. social networks or forums) have become signicant sources for brands to observe users opinions about their products, this task is thus increasingly crucial. However, when applied with real data obtained from social media, we notice that there is a high volume of short and informal messages posted by users on those channels. This kind of data makes the existing works suer from much diculty to handle, especially ones using deep learning approaches.rnrnIn this paper, we propose an approach to handle this problem. This work is extended from our previous work, in which we proposed to combine the typical deep learning technique of Convolutional Neural Network (CNN) with domain knowledge. The combination is used forrnacquiring additional training data augmentation and more reasonable loss function. In this work, we further improve our architecturernby various substantial enhancements, including negation-based data augmentation,rntransfer learning for word embeddings, combination of word-level embeddings andrncharacter-level embeddings, and using multi-task learning technique for attachingrndomain knowledge rules in the learning process. Those enhancements, specicallyrnaiming to handle short and informal message, help us to enjoy signicant improve-rnment on performance once experimenting on real datasets.
Keywords: Sentiment analysis; deep learning; domain knowledge; recurrent neural network;transfer learning; multi-task learning; data augmentation.