International Journal of Computational Vision and Robotics (48 papers in press)
An IoT Based Smart Parking Management System
by Inhwan Jung
Abstract: In this paper, we implemented smart parking information management system based on IoT using ultrasonic parking sensor and Bluetooth beacon. Ultrasonic IoT sensors used for parking sensing are controlled by the Arduino board to collect parking sensing data, and the collected parking information is transmitted to the server in real time using the MQTT protocol. The server stores the parking information in the database and provides the vehicle driver with real-time parking status using the MQTT protocol. The driver starts the smartphone app in the parking lot and enters the parking lot. The smartphone automatically recognizes which parking space using the Bluetooth beacon signal at the entrance of the parking lot and can confirm the parking information of the parking space to be parked at a glance by communicating with the server. The parking management system implemented in this study not only helps the driver to park the car but also uses the real - time parking information stored in the database to obtain marketing information such as hourly, daily or monthly number of visiting customers and average shopping time.
Keywords: IoT; Smart Parking; Bluetooth Beacon; Parking App; MQTT.
Multi-document Summarization Using Feature Distribution Analysis
by Jae-Young Chang
Abstract: Recently, opinion documents have been growing rapidly in an environment where anyone can express an opinion on the Internet or SNS. This situation requires an automatic summarization technique in order to understand the contents of large-scale opinion documents. However, it is not easy to summarize the opinion documents with previous text summarization technologies since the opinion documents include subject expressions, as well as features of targets objects. In this paper, a method to identify and extract the representative documents with a large amount of opinion documents is proposed. In addition, experiments show that the proposed method successfully extracts representative opinion documents.
Keywords: Multi-document Summarization; Text Mining; Opinion Mining; Feature; Social Network Servic.
Situation-Cognitive Traffic Light Control Based On Object Detection Using YOLO Algorithm
by Sung-Dong Kim
Abstract: Current traffic lights provide the green signal with fixed-time interval without considering the traffic situation. As a result, cars in a long line have to wait long time, which causes traffic jams and makes the drivers be irritated. In order to solve the problem, it is necessary to control the green signal interval according to the analyzed traffic volume using the image processing and the machine learning techniques. This paper presents a situation-cognitive traffic light control algorithm that measures the traffic volume using object detection algorithm called YOLO (You Only Look Once) and controls the traffic signal intervals according to the traffic volume. The algorithm expects the smooth traffic flow and the reduction of the drivers stress.
Keywords: YOLO; You Only Look Once; Object Detection; Situation-Cognitive; Traffic Light Control.
Occlusion Handling Strategies for Multiple Moving Object Classification
by Pawan Kumar Mishra, G.P. Saroha
Abstract: A framework has been designed for detection and classification of multiple moving vehicles. Background subtraction is used for detection of multiple moving objects like vehicles using Gaussian mixture model (MOG). Classification for multiple moving vehicles using K-nearest neighbour is done based on different features in this research. The method used in this research also improves the value of accuracy and occlusion rate for multiple moving vehicles in video frames. In this paper, we also learn a single detector for different types of multiple moving vehicles such as buses, trucks, and cars. This detector uses a special kind of function that is known as occlusion metric function. The main goal of this research is to build a function that is used to calculate the performance of detector between number of false positives and hit rate in heavy traffic (high activity) and small traffic (low activity) region.
Keywords: detection; classification; occlusion; accuracy; hit rate; false positive.
Energy Based Features for Kannada Handwritten Digit Recognition
by GURURAJ MUKARAMBI, Basanna Dhandra
Abstract: In this paper, Kannada handwritten digit recognition system is proposed based on discrete wavelet transform filters. A sample data set of Kannada handwritten digits are collected from Schools, Colleges, Business persons and Professionals etc. Due to non-availability of standard data sets. The collected samples of hand written Kannada digits are scanned at 300 DPI. The images are pre-processed using Morphological opening operation for removing the noise and bilinear operation is used for normalized into 32 X 32 pixels as it is the optimum size for the experiment. The normalized sample images were divided into 16 blocks, and then wavelet filters were applied for each of the 16 blocks and computed the Standard deviation for each of them. In this process, a total of 64 standard deviation of the wavelet coefficients are generated of which 48 coefficients are selected as potential features by identifying approximation co-efficient as the non-potential features for discriminating the Kannada handwritten digits, since horizontal, vertical and diagonal coefficients captures the energy in these three directions for Haar, Daubechies, Coiflets and Symlets Wavelet families. The nearest neighbor classifier is applied for recognition. The average recognition accuracy of 94.80% is achieved. The proposed algorithm is free from skew and thinning and is the novelty of the paper.
Keywords: OCR; DWT; Nearest Neighbor; SVM.
An Optimal Mode Selection Algorithm for Scalable Video Coding
by L. Balaji, K.K. Thyagharajan, C. Raja, A. Dhanalakshmi
Abstract: Abstract: Scalable Video Coding (SVC) is extended from its predecessor Advanced Video Coding (AVC) because of its flexible transmission to all type of gadgets. However, SVC is more flexible and scalable than AVC, but it is more complex in determining the computations than AVC. The traditional full search method in the standard H.264 SVC consumes more encoding time for computation. This complexity in computation need to be reduced and many fast mode decision (FMD) algorithms were developed, but many fail to balance in all the three measures such as PSNR (peak signal to noise ratio), encoding time and bit rate. In this paper, the proposed optimal mode selection algorithm based on the orientation of pixels achieves better time saving, good PSNR and coding efficiency. The proposed algorithm is compared with the standard H.264 JSVM reference software and found to be 57.44% time saving, 0.43 dB increments in PSNR and 0.23 % compression in bit rate.
Keywords: Scalable Video Coding; Computation; mode selection; PSNR; Time; Bit rate.
Large-scale scene image categorisation with deep learning-based model
by Hussein A. Al-Barazanchi, Hussam Qassim, Abhishek Verma
Abstract: Increasing depth of convolutional neural networks (CNNs) is a highly promising method of increasing the accuracy of the (CNNs). Increased CNN depth will also result in increased layer count (parameters), leading to a slow backpropagation convergence prone to overfitting. We trained our model (Residual-CNDS) to classify very large-scale scene datasets. The outcome result from the two datasets proved our proposed model effectively handled the slow convergence, overfitting, and degradation. Our approach overcomes degradation in the very deep network. We have built two models (Residual- CNDS 8), and (Residual-CNDS 10). Moreover, we tested our models on two large-scale datasets, and we compared our results with other recently introduced cutting-edge networks in the domain of top-1 and top-5 classification accuracy. As a result, both of models have shown good improvement, which supports the assertion that the addition of residual connections enhances network CNDS accuracy without adding any computation complexity.
Keywords: residual-CNDS; scene classification; residual learning convolutional neural networks TO residual learning; convolutional networks with deep supervision.
Extracting and Searching News Articles in Web Portal News Pages
by Namyun Kim
Abstract: Recently, a large amount of news articles is being created online, and news articles are important resources for understanding social phenomena and trends. Accordingly, a web portal service provides a "Portal News Page" that classifies news articles published from various news sources into sections and provides each news article with a certain structure. Therefore, by analyzing portal news pages, it is possible to automatically extract information about news articles. In this paper, we introduce a prototype that extracts and searches key information of news articles for analysis. Specifically, we describe (1) a crawler that collects, analyzes and parses news articles, and (2) an Elasticsearch server that indexes and searches news information, and (3) a front-end application that provides a search user interface. These systems are expected to provide the foundation for news analytics and forecasting services.
Keywords: Crawler; Search Engine; Elasticsearch; News Service and Analysis.
Cursive Multilingual Characters Recognition Based on Hard Geometric Features
by Amjad Rehman, Majid Harouni, Tanzila Saba
Abstract: The cursive nature of multilingual characters segmentation and recognition of Arabic, Persian, Urdu languages have attracted researchers from academia and industry. However, despite several decades of research, still multilingual characters classification accuracy is not up to the mark. This paper presents an automated approach for multilingual characters segmentation and recognition. The proposed methodology explores characters boundaries based on their geometric features, prior to their recognition. However, due to uncertainty and without dictionary support few characters are over-divided. To expand the productivity of the proposed methodology a BPN is worked out with countless division focuses for cursive multilingual characters. Trained BPN separates off base portioned indicates effectively with rapid upgrade character recognition precision. For reasonable examination, only benchmark dataset is utilized.
Keywords: OCR; Multilingual character recognition; features mining; geometrical features; BPN.
Stego-key based image steganography scheme using edge detector and modulus function
by SHIV PRASAD, ARUP KUMAR PAL
Abstract: In this paper, our main concern is to devise an image steganography scheme for enhancing the security along with the payload capacity of the cover image. So, in this work, a secure image steganography scheme is proposed where the embedding process of secret message bits is realized by a secret key. To improve the embedding capacity of the cover image, the hiding process of secret message bits is furnished with the help of cover image characteristic, where more number of secret message bits are concealed into the edge-region instead of the smooth region of the cover image. For improving the security of the content, generally, steganography and cryptography are clubbed together. However, in this work, instead of considering two different security mechanisms, we have embedded the secret message bits into the cover-image with reference of keys i.e. known as a stego-key. This approach not only enhances security but also reduces the computation overhead. Variable length of secret message bits are concealed into the edge-pixels and non-edge pixels using the modulus function based embedding process. The secret message bits are not embedded sequentially into each pixel of the cover image where the number of edge-pixels and non-edge pixels will be varied due to the selection of various threshold values during the edge detection process. This threshold value may be considered as a key-parameter and only the authorized user will able to locate the edge and non-edge pixels during the message extraction process. The scheme is implemented on some standard grayscale images and satisfactory results are achieved in terms of visual quality along with the higher payload.
Keywords: Data hiding; Edge detection; Image steganography; Information security; Modulus function; Stego-key.
An Ensemble of Neural Networks for Non-Linear Segmentation of Overlapped Cursive Script
by Amjad Rehman
Abstract: Intro: Precise character segmentation is the only solution towards higher Optical Character Recognition (OCR) accuracy. In cursive script, overlapped characters are serious issue in the process of character segmentations as characters are deprived from their discriminative parts using conventional linear segmentation strategy. Background: Hence, non-linear segmentation is an utmost need to avoid loss of characters parts and to enhance character/script recognition accuracy. This paper presents an improved approach for non-linear segmentation of the overlapped characters in handwritten roman script. Contribution: The proposed technique is composed of a sequence of heuristic rules based on geometrical features of characters to locate possible non-linear character boundaries in a cursive script word. However, to enhance efficiency, heuristic approach is integrated with trained ensemble neural network validation strategy for verification of character boundaries. Accordingly, correct boundaries are retained and incorrect are removed based on ensemble neural networks vote. Conclusion: Finally, based on verified valid segmentation points, characters are segmented non-linearly. For fair comparison CEDAR benchmark database is experimented. The experimental results are much better than conventional linear character segmentation techniques reported in the state of art. Ensemble neural network play vital role to enhance character segmentation accuracy as compared to individual neural networks.
Keywords: Non-linear character segmentation; ensemble neural networks; Analytical approach; CEDAR database.
Performance Evaluation of Various Texture Analysis Techniques for Machine Vision based Characterization of Machined Surfaces
by Ketaki Joshi, Bhushan Patil
Abstract: Machine vision-based inspection of surface quality leverages the principle of surface-texture characterization, capitalizing on image data characteristics. Frequently, surface-texture analysis adopts statistical and filter-based techniques, for this purpose. For surface texture characterization, traditionally researchers prefer parameterized histograms, gray level co-occurrence matrices, discrete Fourier transforms as well as discrete wavelet transforms. Despite popular usage, extant literature features very little in terms of comparative analyses amongst these techniques.
Accordingly, this paper evaluates comparative performance of these techniques, for characterization of machined surfaces and also recommends a novel hybrid technique that leverages higher discriminating capability. This hybrid discriminant-analysis methodology is derived from characterization of 532 images of multi-textured machined surfaces. The results prove that the proposed technique, provides superior performance with higher accuracy, while requiring reduced optimal set of parameters, for inspection of surface quality.
Keywords: machine vision; texture analysis; image processing; discriminant analysis; multivariate techniques; surface texture; surface quality; histogram; gray level co-occurrence matrix; discrete Fourier transform; discrete wavelet transform.
Deep Reinforcement learning Collision Avoidance using Policy Gradient Optimization and Q-Learning
by Shady Maged, Bishoy Mikhail
Abstract: Usage of Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) children of policy gradient optimization method and Deep Q-Learning Network (DQN) in Lidar based differential robots are proposed using Turtlebot and OpenAIs baselines optimization methods. The simulation results proved that the three algorithms are ideal for obstacle avoidance and robot navigation with the utter advantage for TRPO and PPO in complex environments. The used policies can be used in a fully decentralized manner as the learned policy is not constrained by any robot parameters or communication protocols.
Keywords: ROS; Robotics; Deep Learning; ReinforcementrnLearning; Deep Q-Learning; Trust region Optimization;rnProximal Policy Optimization.
Angle Histogram of Hough Transform as Shape Signature for Visual Object Classification (AHOC)
by Aaron Rababaah
Abstract: This work presents a new method for object classification using Hough Transform (HT) and angle histogram as a signature of the target objects (AHOC). Several methods are reported in the literature that exploit the Hough Transform and other techniques as a pre-processing step to characterize objects to be used in object detection, recognition, classification etc. The HT is a very powerful technique to extract shape features from a 2D objects which has been used in many studies and implemented successfully in many applications. Our study is unique by post processing the HT voting space using a binary threshold then computing an angle histogram of the resulting angle space as a shape signature for the target object. Our image set consisted of 25 simple geometric shapes and six complex natural object classes of: trees, people, cars, airplanes, houses and horses. The method was trained and tested using 225 images from these six different classes and found to be robust with a classification accuracy of 95.83%.
Keywords: visual object characterization; object classification; Hough transform; angle histogram; template matching.
A lossless blind image data hiding scheme for semi-fragile image watermark
by Amine Khaldi
Abstract: In this work we propose a digital watermark approach that is invariable to rotation and guarantees the integrity of the inserted mark.The size of the resulting image is identical to the original image and the process guarantees acceptable transparency.For this we have experimented eight substitution processes to conclude that the substitution of four bits gives a great capacity of insertion while guaranteeing an acceptable transparency for the insertion of text. However for the insertion of an image it is recommended to substitute only one bit. This reduces the capacity but guarantees transparency of the watermarking process.
Keywords: Digital watermarking; Imperceptibility; Robustness; Digital Image; least significant bit substitution.
A Study on the Trustworthiness of Store Rating in Restaurant Recommendation O2O Service
by Hyung Su Kim, Sangwon Lee
Abstract: This study is a key piece of information provided by the recently spread O2O service and wants to confirm whether rating information on stores commonly marked as stars is truly reliable for consumers. We will examine whether store ratings of O2O service, which has been known to lower perceived search cost of consumers, are reliable system. First of all, we compared the ratings of the stores registered in a domestic restaurant recommendation O2O service with the actual survey results of the customers who visited the restaurant. As a result of the analysis, it was found that the rating of each store in the app was not correlated with the actual satisfaction of the store or the loyalties, and the result of the evaluation of the specific store attribute was also not significant in predicting the rating of the store in the O2O app. However, it was found that customers who visited the store by mobile app or recommendation of acquaintance were more loyal than customers who visited the store by simple internet search. Therefore, if O2O service establishes a more reliable customer rating system, consumer utilization of O2O service is expected to increase further.
Keywords: O2O; Store Rating; Customer Review; Customer Loyalty.
Deep Learning based Intelligent Surveillance Model for Detection of Anomalous Activities from Videos
by Karishma Pawar, Vahida Attar
Abstract: For safeguarding and monitoring purposes, public places are equipped with surveillance cameras. Timely and accurate identification of suspicious activities is paramount to securing the public places. Assigning human personnel to keep continuous watch over ongoing activities is error-prone and laborious. To alleviate the need of human personnel for monitoring such videos, automated surveillance systems are required. This paper proposes a deep learning based intelligent surveillance model for detection of anomalous activities. The problem of anomaly detection has been handled as one class classification problem. The proposed approach involves 2 dimensional convolutional auto-encoder for feature learning, sequence-to-sequence long short term memory model for learning temporal statistical correlation and radial basis function as activation function in fully connected network for one class classification. We experimented on real-world dataset by two variants of proposed approach and achieved significant results at frame-level anomaly detection.
Keywords: Anomaly detection; computer vision; convolutional autoencoder; deep learning; one class classification; radial basis function; video surveillance.
Cucumber disease detection using Adaptively Regularized Kernel-Based Fuzzy -Means and Probabilistic Neural Network
by Jayanthi M.G, Dandinashivara Revanna Shashikumar
Abstract: Agriculture has been now considered much more than just feeding the ever-growing population of the world. For many decades, computers have been used to provide automatic solutions instead of a manual diagnosis of plant diseases which is costly and error prone. Cucumber, a common economic crop, is one of the most popular vegetables in agricultural field, and occupies a large proportion of vegetable cultivation in our daily lives. So in this paper, recognizing the cucumber disease is utilized. At first, the cucumber diseases were segmented using Adaptively Regularized Kernel-Based Fuzzy𝐶-Means (ARKFCM). Once the disease is segmented, the color feature is extracted based on Hue, Saturation and value (HSV) based semantic technique and texture feature is extracted based on Gray level co-occurrence matrix (GLCM) technique. Then the cucumber disease is classified using a Probabilistic Neural Network (PNN). Finally, the experimentation is done on standard agricultural database and implemented in Matlab. For recognizing cucumber disease such as anthracnose, downy mildew and gray mold, the experimental results show that the proposed method is feasible and effective when conducted on a database of cucumber diseased leaf images.
Keywords: Cucumber disease; Segmentation; Feature extraction; Classification; ARKFCM; PNN; Sensitivity; Specificity and Accuracy.
Challenges for Computer Aided Diagnostics using X-Ray and Tomographic Reconstruction Images in craniofacial applications
by Abhishek Gupta
Abstract: Computer-aided diagnostic systems are very important and crucial for patients diagnosis and treatment planning. To automate such systems requires the combination of various different steps involved in the system. The variation in the process may make the system failure. Therefore, it is a need to analyze the systems workflow and to work on the challenges embedded in it. Imaging and automation challenges regarding computer-aided diagnosis are discussed in this paper. Each challenge is introduced within a Computer Aided Diagnosis (CAD) system of craniofacial applications. The significance and importance of every challenge are described in the paper. The method to overcome the challenges and issues are also discussed as an advantage for the readers of the paper.
Keywords: Computer tomography; cephalometry; landmark; computer aided diagnosis; diagnosis; X-Ray; Tomographic reconstruction; craniofacial; medical images; dentistry.
Compact reconfigurable triple notch ultra-wideband bandpass filter for cognitive radio system
by Janardan Sahay, Sanjay Kumar
Abstract: This paper introduces a compact ultra-wideband (UWB) bandpass filter (BPF) with two switchable external notch structures which avoids the interference between the UWB communication with worldwide interoperability for microwave access (WiMAX) and wireless local area network (WLAN) systems, suitable for cognitive radio (CR) applications. The first external notch structure is modified C-shaped structure which generates a sharp rejection at 3.5 GHz to avoid interference with WiMAX system and second external notch structure is back to back T-shaped structure which creates sharp rejection notch bands at 5.2 GHz and 5.8 GHz frequencies for WLAN systems. The filter reconfiguration is achieved by varying the conductance of microstrip line. The response of the proposed structure shows sharp rejections at one WiMAX and two WLAN bands. The bandpass filter covers frequency of UWB system from 3.1 to 10.7 GHz having very low passband insertion loss. The proposed UWB BPF is designed, simulated, fabricated and tested to validate the results. A good agreement is achieved between simulation and experimental results.
Keywords: cognitive radio; notch band; reconfigurable filter; ultra-wideband; bandpass filter; interference; WLAN; WiMAX.
Adaptive Robust Control of a Four-Cable-Driven Parallel Robot
by Arash Kiani, Seyed Kamal-e-ddin Mousavi Mashhadi
Abstract: This present study introduces an adaptive control strategy for Four-cable robots. An adaptive sliding mode control to overcome the uncertainties of the system as well as avoidance of estimating an upper bound of the system uncertainties is presented. The proposed controller is designed based on the Lyapunov stability theory. Therefore, it ensures the stability of the closed-loop system and makes the tracking error converge to zero. In this robot, the cables can only pull the end-effector but not push it; therefore we present a simple mathematical solution to design a positive tension controller for the cable suspended robot with redundant cables. The properties of the proposed method such as high performance tracking, disturbance rejection and insensitivity to parameter variations are demonstrated by simulation.
Keywords: Cable robots; Adaptive sliding mode control; Lyapunov stability Adaptive Inverse Dynamic Control; Positive cable tensions;.
Computing disparity map using Minimum Sum Belief Propagation for stereo pair images
by Chitra Suresh, Kushal R. Tuckley
Abstract: Stereo matching between two images is done by computing disparity of all points on the object. The process involves identifying corresponding points in stereo image and finding the horizontal shift. Presently there is no method that finds the shift in the corresponding points between left and right images, this is due to non-availability of procedure to identify the group of pixel in the right and left image of the same object. The available local methods either uses window or feature to find shift in a stereo image. In these methods, finalizing size of the window or deciding the correct feature remains an unresolved issue. On the other hand, global methods uses graph theory and probability theory to find the shift efficiently.
The belief propagation algorithm is one of the global method devised to offer computationally efficient approach with good results. This paper has applied "Minimum Sum Belief Propagation" method for message updates with linear "Quadratic function "for computation of horizontal shift in stereo image. The results with the computational estimations are presented hereby and based on these results, suggestive comments on effectiveness of update which indicate strategy versus type of the image are also mentioned.
Keywords: Stereo Image; Parallax Effect; Stereo Matching; Markov Random Field; Belief Propagation; Disparity Map.
Extended Opinion Lexicon and ML based Sentiment Analysis of tweets : A novel Approach towards Accurate Classifier
by Gaurav Dubey, Santosh Kumar, Sunil Kumar, Pavas Navaney
Abstract: Micro-blogging, today has become a very trendy communication tool among internet users. Millions of users share their opinions on diverse aspects of life which are rich sources for opinion mining. This paper addresses the sentiment analysis of twitter data on Demonetization. An new approach to sentiment analysis based on extended opinion lexicon-based-scores has been presented in this paper. Na
Keywords: Sentiment analysis; NLP; opinion mining; lexicon.
Human Computer Interactive Future Framework: Automation of Human Interaction & Interfaces
by Anil Dubey, Mohan Kolhe, Vikash Singh
Abstract: Contemporary era technological improvements and social livingrnurges the advancement of technological implementations and to create anrnenvironment for automation of interaction platform. The sole aim being thernreduction of labor work or human effort and optimizing the throughput gained.rnThe future generation is expecting such developments in current systems, butrnnone can elucidate the particular area of improvements to be done for futurernmodels. In order to cater such requirements, a HCI future framework forrnautomating the interaction & interfaces is proposed. The framework is dividedrninto ten co-frames of interfaces with further classifications for facilitating thernunderstanding of individual components. The activity and classes of co-framesrnis designed according to their properties and the need of people. The basicrntheme behind proposal is to pave path for initiating the technologicalrndevelopments and interface designing for the purpose of reducing the labor andrnincreasing automation of work with motivating people for towards acceptancernof newer systems in HCI to execute the work in an efficient manner.
Keywords: HCI; Framework; Interaction; Interface; Automation; Engineering.
DEVELOPMENT OF NN CLASSIFIER FOR RECOGNITION OF HUMAN MOODS
by Sandeep Awachar, Prashant Ingole
Abstract: Numerous works have been carried out in the field of recognition using artificial neural network, based on audio, video and image processing. Face recognition has been found to be one of the major topics of interest in this field. Recognition of moods opens its ways, when it comes to face recognition. Face recognition can pave its ways to new ideology of recognizing human moods. This paper focus on development of a NN classifier for recognition of human moods from facial expressions. Apart from a normal mood and the six basic moods (anger, disgust, Fear, sad, surprise and happy), the three added moods viz. contempt, courage and desire forms the new and unique feature of this work. Gabor filter and principal component analysis have been utilized for feature extraction and Neural Network as classifier for classifying various moods on the basis of various face features respective of the moods.
Keywords: facial expressions; neural network,gabor filter,PCA,anger,surprise,image processing,mood recognition,classifier,extraction,features.
Adaptive Neuro-Fuzzy Inference System based On-the-Move Terrain Classification for Autonomous Wheeled Mobile Robots
by Rakesh Kumar Sidharthan, Ramkumar Kannan, Seshadhri Srinivasan
Abstract: Building intelligence in autonomous robots to classify heterogeneous terrains on-the-move is a challenging task, but a pivotal feature required for accomplishing safety critical missions. This paper proposes an adaptive neuro-fuzzy inference system for online terrain classification in wheeled mobile robot using the steady state behavior of robot wheel on the terrain. The key idea is to model the wheel-terrain interactions as a parametric varying system, whose steady state behaviors are characterized by the terrain type. The proposed method use the steady state gains and the corresponding input command to robot wheel for identifying the terrain type. Our results show that the proposed approach has classification accuracy of 95.2 % for the trained terrains, whereas 94.2 % and 93.8 % are observed in robust and adaptive testing, respectively. Additionally a customized graphical user interface is developed in order to provide easy access to the researchers for terrain identification.
Keywords: Autonomous robots;Adaptive neuro-fuzzy inference system;Terrain classification;Wheel-terrain interactions;User interface.
Characterizing Local Feature Descriptors for Face Sketch to Photo Matching
by Samsul Setumin, Shahrel Azmin Suandi
Abstract: Sketch and photo are from a different modality. Inter-modality matching approach requires right feature representation to represent both images so that the modality gap can be neglected. Improper feature selection may result in low recognition rate. There are many local descriptors have been proposed in the literature, but it is unclear which descriptors are more appropriate for inter-modality matching. In this paper, we attempt to characterize local feature descriptors for face sketch to photo matching. Our evaluation for the characterization uses Cumulative Match Curve (CMC), and we compare seven different descriptors that are LBP, MLBP, HOG, PHOG, SIFT, SURF and DAISY. The evaluation focuses only on a viewed sketch. Based on the experiments, we observed that gradient-based descriptors gave higher accuracy as compared to the others. Out of five popular distance metrics evaluated, L1 gives a better result as compared to the other similarity distance measures.
Keywords: Local feature descriptors; sketch to photo; matching; forensic sketch;face recognition.
Extended COCOMO: Robust and Interpretable Neuro-fuzzy modeling
by Shailesh Tiwari
Abstract: The software project management activities deal with a set of tasks which spans in every phases of the software development life cycle. Prediction of Software development efforts is one of the crucial activities in software project management. Various software cost estimation are developed by the researchers in last few decades. But still, search for the perfect model for software cost estimation has become most difficult task of the organisations dealing in software development. Constructive Cost Model (COCOMO) is one most acceptable model in recent years. This paper presents the extended version of COCOMO. This extension is done with the help of two very popular methods i.e. artificial neural networks (ANN) and fuzzy logic, which ultimately provide the foundation for effort assessment models. Firstly, the expert judgement about model is used for validation, and overpowers the common software engineering black box problem that arises widely in ANN-based solutions. Moreover, we choose the best combination of one of the three membership functions for continuous-rating values which reduce the variance while estimating the cost of similar projects. The validation, using 93 NASA projects Dataset, shows that the model significantly improves the estimation accuracy in terms of mean magnitude of relative error (MMRE) by 10.104 %relative to other known estimation models.rn
Keywords: Fuzzy Logic; Neural Network; COCOMO; Neuro-Fuzzy Software Effort Estimation; NASA projects Dataset; and Mean Magnitude of Relative Error.
Fusing pyramid histogram of gradients and optical flow for hand gesture recognition
by Suni S S, Gopakumar K
Abstract: Human computer interaction systems based on hand gestures catch the eye of the research community for implementing natural communication between man and machines. However, different persons perform the same gestures differently in terms of velocity and motion scale. This poses a challenging issue in minimizing the variations between different persons and maximize the coherence of the same gestures. In this paper, the original pyramid histogram of gradients in three orthogonal planes combining with the dense optical flow to create dynamic descriptor is explored in to discriminate features for recognition of hand gestures. The shape and motion features of images in a video sequence are captured to obtain the geometric and illumination invariant spatio-temporal feature descriptor for classification. A multiclass support vector machine classifier is used to recognize the hand gestures. The proposed method gives an excellent recognition rate and excels the existing approaches.
Keywords: Human computer interaction;Pyramid histogram of gradients;Optical flow;Hand gesture recognition;Multi-class support vector machine.
Structured Learning and Prediction in Face Sketch Gender Classification and Recognition.
by Khalid Ounachad, Mohamed Oualla, Abdlghani Souhar, Abdelalim Sadiq
Abstract: Structured prediction methods have become an attractive tool for many machine learning applications. For this raison, the objective of this paper is to identify the gender of the human being by using their face sketch applying a structured learning approach. We used a deep geometric descriptor as features and the gender as labels, and structured learning and prediction approach as matching. The basic idea is to extract perfect face ratios for the face sketch as a feature and the labels are the gender. To extract perfect face ratios, we use the landmarks point in the face then sixteen features will be extract. The training and the testing tasks are applied to CUHK Face Sketch dataset (CUFS). An experimental evaluation demonstrates the satisfactory performance of our approach on CUFS and the recognition rate reaches more than 98%.
Keywords: Structured Learning; Prediction; Face Sketch; Face Sketch Recognition; Facial Gender Recognition; Perfect Face Ratios.
Mathematical Variable Detection in Document Images
by Bui Phong, Hoang Manh Thang, Le Thi Lan
Abstract: Mathematical expression detection in documents is a prerequisite step for developing a mathematical retrieval system that has attracted many researches recently. In the detecting process, one challenging issue is the detection of variables. The similar properties of variables and narrative text cause many errors in the detection in existing approaches. In the paper, a novel detection methodology of variables in inline mathematical expressions is proposed. The merit of the method is that it can operate directly on the variable images without the employment of character recognition. The proposed method uses the features of Projection Profile of images and the fine-tuning of different machine learning algorithms in the detection process. The achieved accuracy varies from 86.14\% to 94\% for the detection of variables in inline expressions in document images in various public benchmark datasets. The performance comparison with existing methods demonstrates the effectiveness of the proposed method.
Keywords: Document analysis; Mathematical expression extraction; Italic detection; Machine learning.
Taylor Rate-Distortion trade-off and Adaptive block search for HEVC Encoding
by Anitha Kumari R.D, Narendranath Udupa A
Abstract: The advancement in High efficiency video coding (HEVC) is adapted for defining the subsequent generation compression model for offering efficient compression without affecting the image quality. The HEVC offers improved performance than the existing compression models. This work develops an approach for video compression by proposing weighted entropy coding and adaptive block search based Rate-Distortion (R-D) trade-off. A new R-D trade-off, named Taylor R-D trade-off, is designed using Taylor series. The adaptive block search algorithm is proposed for initiating the block search process of motion estimation in video coding by selecting the optimal block using the Hexagon Based Tree Search Algorithm (HBTSA), along with the Taylor R-D trade-off. Initially, the frames are extorted from the input video. Then, the video frames are divided into macroblocks to perform the adaptive block search. Further, the suitable blocks are selected and given to the encoding process by weighted Context-Adaptive Binary Arithmetic Coding (CABAC) that employs a weighted entropy function to persist the video quality after the compression. The results evaluate that the proposed HBTSA method shows improved PSNR and SSIM using Football, coast guard, garden, and, tennis with values 42.717dB, and 0.991, respectively.
Keywords: Video coding; HEVC; R-D trade-off; Taylor series; Adaptive block search.
Guidance Based Improved Depth Upsampling With Better Initial Estimate
by Chandra Shaker Balure, Ramesh Kini M
Abstract: Like optical images, depth images are also gaining popularity because of its use in many applications like robot navigation, augmented reality, 3DTV and more. For such applications to perform better they need high-resolution (HR) depth images. The commercially available depth cameras does not satisfy the requirements of the above mentioned applications as it generate depth images which suffer from low spatial resolution, corrupted with noise, and missing regions. Such images need to be super-resolved, denoised and inpainted to be fed as input to the application for better performance. Super-resolution (SR) is a class of techniques which take sequence of LR inputs or a single LR input to produce an HR output. Since SR is an ill-posed inverse problem there exist multiple solution to such under-determined system. A good initial estimate is always a good regularizer to find the optimal solution in the infinite solution space. We propose an initial estimate as part of our SR pipeline, especially for higher upsampling factor, say
Keywords: Super-Resolution; Depth Image; Initial Estimate; Interpolation; Cascade;.
Automated System for Road Extraction & Traffic Volume Estimation for Traffic Jam Detection
by Jyoti Parsola, Durgaprasad Gangodkar, Ankush Mittal
Abstract: Efficient vehicle detection and traffic density estimation for traffic congestion is one of the essential tasks of traffic surveillance and it has been solved up to some extent. However there is still a need for better solution which effectively and efficiently estimates traffic congestion. In this paper we propose an automated surveillance system for vehicle detection from a traffic scene. Moreover the proposed system performs the functions like identifications of path followed by vehicle, estimation of traffic volume, identification of moving direction of vehicle, traffic analysis and reports traffic jam. These functions are not collectively discussed by a single author. Our proposed system performs all the function discussed aforementioned. Rather than extracting features of the roads or creating a model our system directly extracts road region from the road scene by motion segmentation of vehicle. Further based on the movement of vehicle, path is plotted. Vehicular density is computed with respect to the corresponding road of moving vehicle. A traffic jam alert is generated based on the intensity of traffic density. Traffic density is categorized in heavy, medium or low based on the flow of the traffic. The performance of our proposed system is evaluated using various benchmark datasets captured in various road scenarios (urban, highway) which, shows the ability of proposed method to work in any road conditions and thus makes it suitable for deployment.
Keywords: Intelligent Transportation System; Traffic Density Estimation; Congestion Detection.
Comprehensive analysis of a diverse group of features and development of Vision-Based Two-Level Hand Detector under Practical Environment Conditions
by Songhita Misra, Rabul Laskar
Abstract: Developing a bare-hand detection system for practical environment conditions is a complex and challenging task. Factors such as change in appearance, uneven illumination, and complex background add up to the difficulty in detecting the target hand. Present study newly explored 13 color-texture and integrates them with texture models to develop robust two-level hand detector under practical conditions mentioned above. Color-texture and texture models are assessed using multiple classification tools and employed in two subsequent levels such that the second level only classifies the optimal sub-windows classified in the first level. The analysis showed that the proposed two-level detection system detects the hand with 53.4% higher accuracy than the baseline model which the integrated motion detection and skin filtering method, under the practical conditions. With five times lower time-complexity than the baseline model, the proposed system can be used to detect hand in both static as well as dynamic gesture systems.
Keywords: Two-level hand detection system; Complex background; Positional variation; AdaBoost classifier; Color-texture features.
Design of Filter for Image De-noising using Discrete Wavelet Transform for ASIP
by Mood Venkanna, Rameswar Rao, Chandra Sekhar
Abstract: Application Specific Instruction Set Processors (ASIP) is a customized processor for user specific application. Though a significant research has been done on this, still it is most promising technology, due to lack of efficient methodologies for designing the processor configuration according to the applications. Again ASIP solution explores the trade-off between the dedicated hardware design and flexibility among software. It endeavours to fulfil the functionality of an algorithmic with lowpower costs and less complexity. In this paper, an approach is considered to design a processor for image de-noising.Thedesign of suitable filter is an important task for the transmission and real-time processing. Designing ASIPs requires a suitable design of custom datapath, simultaneously modify the instruction-set, decoder including the compiler. We present an ASIP based on custom architecture design using the Discrete Wavelet Transform (DWT) as a filter. It startswith the general purpose datapathlike MIPS. Itcustomizes the datapathiteratively for better power utilization, usable area and performance. All the experiments have been synthesized using Xilinx FPGA andalso verified in Spartan board. The subjective evaluations of the filter isanalysed through various figures. Further it is implemented in HDL to support the customized processor.
Keywords: Image Filtering; Wavelet Transform; ASIP; Impulse noise; DSP;FPGA; VHDL.
A Novel Approach for Secured Multimodal Biometric Authentication based on Data Fusion Technique
by Gayatri Bokade, Rajendra Kanphade
Abstract: The upcoming biometric technology is focusing more on multiple biometric traits to authenticate the user for security, access control and Universal Identity. This is because utmost considered powerful biometric when used solely, grieves from spoof attacks, intra-class erraticism, noise, vulnerability etc. In the arena of biometrics, the integration of evidences offered by multiple biometric is considered as an effective mode of enhancing the authentication accuracy and security. This research work proposes an authentication technique for a multimodal biometric scheme using three traits i.e., Face, Ear and Palmprint at feature level fusion. This Novel method utilizes the raw data fusion technique to create unique pattern for each registered user. Even with the use of three different biometrics traits, the template is created with extreme low dimension and by using a single algorithm. The proposed system provides security with reduced computational complexity and improved robustness.
Keywords: authentication; multimodal biometric; raw data fusion; Face; Ear; Palmprint; computational complexity; robustness.
ARP Cache Poisoning: Detection, Mitigation & Prevention Schemes
by Jayati Bhardwaj, Virendra Yadav, Munesh Chandra Trivedi, Anurag Sen
Abstract: Providing security to the networks is of utmost importance for all kinds of users. The fundamental of any communication network are implemented protocols. Hence ensuring security at the protocol level is point of concern. Major communication protocols like IP and ARP lack mechanisms for protection against malicious activities.ARP is a network communication protocol employed for mapping a network address to a MAC address at the data link layer of the IP suite. However, the absence of authentication process in the ARP protocol allows vulnerabilities like ARP Cache Poisoning or Arp Spoofing to take place. This allows malicious nodes to associate its MAC address with the IP address of host and hence resulting in the exposure of network to several severe attacks like DoS, MITM, Session hijacking and many more. With the ongoing increasing number of attacks a lot number of detection, prevention and mitigation schemes have been proposed regarding the scope of the problem. However, there is no universally accepted benchmark scheme that reaches to the solution at fullest. This paper presents a comprehensive review of all those schemes along with their associated strengths and weaknesses. Also a comparative evaluation of schemes is included in the paper for further insight into the development of improvised solutions to the above stated problem. This evaluation leads to a summary of all the requirements to be needed for a novel approach leading to the solution of the mentioned problem
Keywords: ARP Cache Poisoning; MAC address; Proxy ARP; Public Key Cryptography; Spoofing.
Incremental Approach for Multi-Modal Face Expression Recognition System using Deep Neural Networks
by Anand Handa, Rashi Agarwal, Narendra Kohli
Abstract: Facial Expression Recognition (FER) plays a vital role in building human-machine interaction systems. The ability to recognize facial expressions and emotions automatically and efficiently helps in building novel applications such as Human Machine Interaction system, Human-Robot interaction, driving safety and health care. Face helps in depicting a wide range of information about a persons identity, sex, age, mental state and emotional state. Despite significant work and improvement in this field, the facial expression is still one of the most challenging tasks. Convolutional Neural Network (CNN) and Deep Convolutional Neural Network (DCNN) has evolved as an efficient tool for facial expression recognition models but they differ significantly in terms of their network configuration and architecture. There exists a variety of bottlenecks in existing facial expression recognition systems such as they lack in generalizing their algorithms over different databases. Hence, in this paper, we propose a model based on DCNN to overcome these challenges which exist in recognition of emotions and validated our results on a variety of well-known databases in three steps. Firstly, the proposed model focuses on the selection of an appropriate activation function depending on its accuracy and training loss over a database. Secondly, an incremental strategy is used in which deeper models are developed simultaneously from shallower networks to increase the accuracy with less training loss. Lastly, by an ensemble of CNN and DCNNs, the model achieves an accuracy of 74.15% for FER2013, 96.20% for CK+ and 98.25% for JAFFE databases, outperforming previous work.
Keywords: Activation functions; Deep Neural Network (DNN); ConvolutionalrnNeural Network (CNN).
HADEM-MACS: A Hybrid approach for detection and extraction of objects in movement by multimedia autonomous computer systems
by Elie FUTE T.
Abstract: Nowadays, multimedia information become an inescapable medium for the validation of applications such as identification, localization and objects tracking. These give rise to many processing methods that, after collecting multimedia data (images, videos), continue with a preprocessing in order to reduce noise, finally it finishes with processing in order to extract objects, more precisely the form of object that capture our domain of interest. The first stage consists to detect objects in movement in the scene. This detection passes through a background modeling. Model based on mixture of Gaussian is commonly used. However, this approach is subject to resources consumption mainly processing unit and memory. We present in this paper a hybrid approach of detection and extraction of objects in movement by a multimedia wireless sensor network. It is based on an improved frame difference, an adapted mixture of Gaussian and a simplified shadow removal.
Keywords: Background subtraction; movement detection; mixture of Gaussian;
neural network; multimedia.
Perceptual Image Quality Assessment Based on Gradient Similarity and Ruderman Operator
by Zianou AHMED SEGHIR
Abstract: In this work, a new metric for image quality assessment is suggested, which provides more suppleness than previous measures in using Ruderman operator, visual region of interest and gradient similarity. Firstly, the luminance distortion between the reference and test images is determined. Secondly, the gradient similarity is computed by using canny filter and proposed gradient mask. Thirdly, the test and reference images are transformed using Ruderman operator. Fourthly, the visual region of interest is calculated by employing entropy operator. Lastly, the dissimilarity between the reference and test images is obtained, by combining all previous metrics: luminance distortion measure, gradient similarity measures, Ruderman measure and visual region of interest measure. Experimental comparison demonstrates the effectiveness of the proposed method.
Keywords: Ruderman operator; image quality assessment (IQA); gradient similarity.
Human Skin Ringworm Detection Using Wavelet and Curvelet Transforms: A Comparative Study
by Manas Saha, Mrinal Kanti Naskar, B. N. Chatterji
Abstract: The common human skin disease called ringworm is investigated in the light of computer vision. Two distinct methodologies are developed for its detection. The first methodology implements three level multi-wavelet decomposition of the skin images and subsequent evaluation of the approximation and detail subband energies which act as the texture characterizing features. The second methodology incorporates the curvelet to segment the circular protrusion of the skin images especially with ringworms followed by statistical texture investigation by gray level co-occurrence matrix (GLCM). After feature extraction by both the methodologies, binary classifier called the support vector machine (SVM) recognizes the images as ringworm with detection accuracy of around 87% and 80% for the first and second methodologies respectively. In addition, the performance indexing parameters of SVM classification like sensitivity, specificity, Positive Predictive Value (PPV) and Negative Predictive Value (NPV) which are not previously addressed are evaluated. Both the methodologies are comprehensively demonstrated and compared to select the better one. The selected method is then compared with the available technique and commented upon.
Keywords: Multiresolution; Wavelet; Curvelet; Approximation subband; Detail subband; Energy signature.
A Wikipedia-based Semantic Tensor Space Model for Text Analytics
by Han-joon Kim, Jae-Young Chang
Abstract: This paper proposes a 3rd-order tensor space model that represents textual documents, which contains the concept space independently of the document and term spaces. In the vector space model (VSM), a document is represented as a vector in which each dimension corresponds to a term. In contrast, the model described here represents a document as a matrix. Most current text mining algorithms only take vectors as their input, but they suffer from term independence and loss of term senses issues. To overcome these problems, we incorporate the concept as a distinct space in the VSM. For this, it is necessary to produce the concept vector for each term that occurs in a given document, which is related to word sense disambiguation. As an external knowledge source for concept weighting, we employ the Wikipedia encyclopedia, which has been evaluated as world knowledge and used to improve many text-mining algorithms. Through experiments using two popular document corpora, we demonstrate the superiority of the model in terms of text clustering and text classification.
Keywords: tensor space model; vector space model; text mining; concepts; Wikipedia.
COLOUR THRESHOLDING BASED AUTOMATIC Ki67 COUNTING PROCEDURE FOR IMMUNOHISTOCHEMICAL STAINING IN MENINGIOMA
by FAHMI AKMAL DZULKIFLI, MOHD YUSOFF MASHOR, HASNAN JAAFAR
Abstract: Image processing are widely used by medical experts since it can help them by providing extra visualization for early detection and treatment. Nuclei or cell counting represents a critical part of the histopathological analysis. Nuclei segmentation is the initial step in cell counting and is very challenging, especially in determining between the normal and abnormal cell nuclei. This is due to the variation of the cell shape and size. The Ki67 is a nuclear protein that was widely used among the pathologists to measure the proliferation of tumour cells. Generally, the pathologists use the manual counting technique for counting the Ki67 cells. However, the counting results has poor reliability and lack of accuracy. The current study aimed to propose an automatic Ki67 cell counting for meningioma images by using the colour thresholding approach. The proposed method has been tested on 12 photomicrographs of meningiomas. The performance of the proposed method was compared to the manually segmented images, which have been validated in prior by the medical expert. The results showed that the proposed method was able to segment the immunostained positive and immunostained negative Ki67 cells with an average accuracy of more than 90%. For counting results, the proposed system produced good results in counting the Ki67 cells with an average relative accuracy of 0.91 for positive Ki67 cells and 0.89 for negative cells. Furthermore, the average time of executing the proposed algorithm was fast at 24 seconds per image.
Keywords: Automated Counting; Colour Thresholding; Image Segmentation; Immunohistochemical Staining; Ki67 Cell; Meningioma.
Blob Analysis of an Automatic Vision Guided System for Fruit Picking and Placing Robot
by Tresna Dewi, Zarqa Mulya, Pola Risma, Yurni Oktarina
Abstract: Agriculture has a strategic role in improving the economic development in a country. As the population grows, the demand to feed the nation. The agriculture strategy needs to be improved adopting automation for better handling of the harvest. The automation can be accommodated by robotics application started by implementing a pick and place robot to move the product. This paper presents the blob analysis method as the visual cue for a pick and place robot handling fruit. Blob analysis is used to detect fruit based on color and shape that is processed by filtering objects and extracting blob using morphological operators. The main controller of the robot is an Arduino Mega that moves the robot based on the input from processed image using Phyton and OpenCV in Raspberry Pi. The captured images are facilitated by a PI Camera functioning as an "eye" for the robot. The experiment was conducted to prove the effectiveness of the proposed method where the average time of picking and placing fruit is 6.69 s for fruit in Position 1 with a range of 332-334 of x and 255-266 of y coordinates respectively, and 7.63 s for Position 2 with a range of 475-576 in x and 205-206 of y coordinates. The image plane considered in this study is 600 x 480 pixel frame. The experiment shows that the proposed method is effective as an automatic vision guided system for fruit picking and placing robot.
Keywords: Agriculture robot; Blob analysis; Pick and place robot; Visual cue.
Automatic Defect Inspection System for Beer Bottles based on Deep Residual Learning
by Qiaokang Liang, Shao Xiang, Jianyong Long, Dan Zhang, Gianmarc Coppola, Wei Sun, Yaonan Wang
Abstract: Recyclable beer bottles are increasingly popular due to the cost effectiveness in recent years. Prior to refill, they need to be scrubbed and sanitized, which requires quality inspection. Automatic detection of defects in recyclable beer bottles would reduce both the cost of the production process and the time spent in the quality inspection. A novel approach is proposed for automatic detection of defects occurring on the beer bottles by deep residual learning. This method extracts the characteristic information of beer bottle defects through the deep learning network and realizes the classification of defect characters. In this work, the recognition of 3 kinds of common defects (defective body, defective mouth, and defective bottom) is realized, and the promising result demonstrated that the proposed method is capable of inspecting defects of beer bottles with outstanding accuracy. Particularly, a state-of-the-art Convolutional Neural Network (CNN) was applied to the detection of beer bottle defects, which improved the accuracy of beer bottle detection comparing with traditional methods. Experimental results show that the new approach satisfies the requirement of defect detection and is able to improve the production efficiency.
Keywords: Detection of defects; Deep learning; Convolutional Neural Network; Quality inspection.
An Optimal Automatic Brain Tumor Detection Using Fuzzy Co-Clustering Algorithm
by Heena Hooda, Om Prakash Verma
Abstract: This paper presents an automatic technique to detect brain tumor from real time Magnetic Resonance Imaging images. Traditionally, tumor is mapped manually by the radiologist in the lab and this process is very time consuming and prone to human error. The proposed strategy makes use of fuzzy co-clustering for images algorithm for initial segmentation of MRI images. The parameters in the FCCI algorithm are optimized using Particle Swarm Optimization technique. The intra-cranial mask is extracted from the MRI scan images by using the intensity difference as a measure to differentiate between the extra-cranial and intra-cranial region. The segmented image along with the intra-cranial mask is used to detect and extract brain tumor and calculate the area of tumor from MRI scan images of the brain. The performance of the algorithm is evaluated on the basis of match score, accuracy score, dice score and Jaccard's similarity coefficient. The result of the proposed approach corresponds to the ground truth in tracking the total area of the tumor and is seen to outperform most of the techniques. The real time database is taken from Rajiv Gandhi Cancer Institute & Research Centre, Delhi, India and results are validated by the radiologist.
Keywords: Brain MRI images; Tumor Detection; Clustering; Particle Swarm Optimization; Morphological operation,Fuzzy Co-Clustering Algorithm.
Maximum Entropy Based Semi Supervised Learning for Automatic Detection and Recognition of Objects Using Deep Convnets.
by Vipul Sharma, Roohie Naaz Mir
Abstract: Object detection and localization is one of the major research areas in computer vision that is growing very rapidly. Currently, there is a plethora of pre-trained models for object detection including YOLO, Mask RCNN, RCNN, Fast RCNN, Multi-box etc. However, it takes only a few amount of effort to detect most of the objects in an image or a video. In this paper we proposed a new framework for object detection called "Maximum Entropy Based Semi Supervised Learning for Automatic Detection and Recognition of Objects". The main objective of this paper is to recognize objects from a number of visual object classes in a realistic scene and our biggest motivation is, to detect the objects correctly and simultaneously. The major operations of our proposed approach are Preprocessing, Localization, Segmentation and Object Detection. In preprocessing, three processes namely noise reduction, intensity normalization, and morphology are considered. Then localization and object segmentation is performed using maximum entropy in which optimal threshold is detected and in the end, object detection is performed using Deep ConvNet. The performance of the proposed framework is evaluated using MATLAB- R2018b and it is compared with some previous state of the art techniques in terms of localization error, detection and segmentation accuracy along with computation time.
Keywords: Maximum Entropy; Object Detection,Weakly Supervised Learning; Deep Convolutional Neural Networks; Segmentation and Localization.