Forthcoming and Online First Articles

International Journal of Data Analysis Techniques and Strategies

International Journal of Data Analysis Techniques and Strategies (IJDATS)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Data Analysis Techniques and Strategies (9 papers in press)

Regular Issues

  • The Impact of Online Trading from a Personal and Technical Perspective on Trade Stocks in Emerging Markets   Order a copy of this article
    by Abdulrhman M. Alshareef, Mohammed Khojah 
    Abstract: Stock markets are an attractive investment environment for new investors from different financial backgrounds. In emerging markets, the risk ratio is considered high; however, the profit margin is attractive. The task of obtaining future information and forecasting is considered an essential advantage for financial institutions. The premise is that the emerging markets did not have the chance to get the trading policy derived from their own environment. Therefore, we wanted to investigate what aspects of objectives investors comprehend as a suitable policy to comply with. This study investigates the means and the fundamental objectives of short-term investment. It focuses on the personal and technical perspectives of investors in emerging markets. The results bare the relationship between means objectives and fundamental objectives. This contribution aids academics and decision-makers in finding the most relevant aspects that influence the investor’s decision to trade stocks in emerging markets regarding personal and technical perspectives.
    Keywords: stock trading; stock market; emerging markets; value theory; fundamental objectives; means objectives.
    DOI: 10.1504/IJDATS.2022.10051414
  • Metalearning using Structure-rich Pipeline Representations for Improved AutoML   Order a copy of this article
    by Brandon Schoenfeld, Kevin Seppi, Christophe Giraud-Carrier 
    Abstract: Automatic machine learning (AutoML) systems have been shown to perform better when they learn from past experience. Examples include Auto-sklearn, which warm-starts the ML pipeline search using existing programs known to perform well on ``similar'' tasks, and AlphaD3M, which uses online reinforcement learning to search the ML pipeline space. These metalearning approaches, as well as many others, depend on simplifying assumptions about the pipeline search space and/or the pipeline representation. Here, we attempt to extend the applicability of AutoML by relaxing such simplifications. Using a sizable metadataset of 194 classification tasks and 4,592 pipelines, we show that using pipeline metadata, including the underlying DAG structure, leads to better estimates of pipeline performance and to more robust rankings of pipelines.
    Keywords: AutoML; Metalearning; Democratization of Data Analysis.

  • Insult Detection using a Partitional CNN-LSTM Model   Order a copy of this article
    by Mohamed Maher Ben Ismail 
    Abstract: Recently, deep learning has been coupled with notice- able advances in Natural Language Processing related research. In this work, we propose a general framework to detect verbal offense in social networks comments. We introduce a partitional CNN-LSTM architecture in order to automatically recognize ver- bal offense patterns in social network comments. Specifically, we use a partitional CNN along with a LSTM model to map the social network comments into two predefined classes. In particular, rather than considering a whole document/comments as input as performed using typical CNN, we partition the comments into parts in order to capture and weight the locally relevant information in each partition. The resulting local information is then sequentially exploited across partitions using LSTM for verbal offense detection. The combination of the partitional CNN and LSTM yields the integration of the local within comments information and the long distance correlation across comments. The proposed approach was assessed using real dataset, and the obtained results proved that our solution outperforms existing relevant solutions.
    Keywords: Supervised learning; Deep learning; Social networks; Insult detection.

    by Sujatha R, Uma Maheswari B, Mansurali A 
    Abstract: Micro, small and medium enterprises (MSMEs) play a crucial role in the economic development of any country. Therefore, survival of these MSMEs becomes very imperative. The objective of this study is to build machine learning models to predict the survival of MSMEs and identify the factors that influence the survival. The data for the study was extracted from Fourth All India Census of MSMEs conducted by Ministry of MSMEs, Government of India. Three machine learning algorithms such as logistic regression, decision tree and random forest are used to build models. Random forest algorithm provided the highest accuracy. Also the study identified outstanding loan, market value, purchase value, owners social category, nature of activity, bank account, cluster type, power source, quality, and organization type as the variables that significantly influence the firm survival. MSMEs can monitor those factors and frame appropriate policies that would help MSMEs to survive and sustain.
    Keywords: MSMEs; Machine Learning; Model Building; Model Performance Measures; Logistic Regression; Decision Tree; Random Forest; Survival; India.

  • Opinion mining of online product reviews using a lexicon-based algorithm   Order a copy of this article
    by Ignacio Martín-Borregón Musso, Marina Bagic Babac 
    Abstract: Worldwide social media is a rich resource of user-generated data, which can help organizations to formulate their business strategies, and affect the process of decision-making in product or service design and implementation. The focus of this paper is on the extraction and analysis of unstructured product reviews for training predictive models, which recognize a specific range of human affective states such as emotions, moods, opinions, or attitudes. Based on the textual and reactions analysis, the emotional reactions lexicon of English words is built from the product posts and comments, and a lexicon-based algorithm is used to predict user opinions on social media.
    Keywords: opinion mining; sentiment analysis; product reviews; social media.

  • Webber T-norm and its influence on QuickRules and VQRules fuzzy-rough rule induction algorithms   Order a copy of this article
    by Andreja Naumoski, Georgina Mirceva, Kosta Mitreski 
    Abstract: The fuzzy-rough rule induction algorithms use fuzzy-rough set concepts such as t-norms, implicators and fuzzy tolerance relationship metrics to calculate the upper and lower approximations. In this direction, the paper examines the influence of the novel Webber t-norm on the model performance obtained with the QuickRules and VQRules algorithms over 19 datasets from different research disciplines. The AUC-ROC metric is used to assess model performance as well as the statistically significance compared to the control model with the highest rank. The obtained results revealed that the k-parameter of the Webber t-norm decreases the model descriptive performance as his value increases, but for the predictive performance of the model there wasnt any influence by this parameter. In both cases, we were able to identify specific algorithm settings, mostly specific metrics for fuzzy tolerance relations that produce models with high accuracy
    Keywords: Webber t-norm; Vaguely quantifiers; Fuzzy tolerance relationship metrics; Fuzzy rough sets; Rule induction algorithms; Statistical Significance.

  • An Intelligent Timestamp Data Manipulation Methodology for Customer Level Resource Efficient Short-term Electric Load Forecasting   Order a copy of this article
    by Ali Waqas, Muhammad Saleem, Abdul Khaliq, Amanullah Yasin 
    Abstract: Due to ever-increasing demand of electric power, the supply companies need to know the expected load consumption in order to perform better scheduling and planning for reliable generation and storage of electric energy. Daily electric load consumption formulates a time-series data, thus making prediction of electric load usage a time-series forecasting problem. Traditionally used statistical, knowledge based and hybrid techniques for forecasting do not ensure a high level of accuracy. On the other hand, highly accurate techniques like Deep Learning are computationally expensive and require non-temporal data additionally to improve the forecast accuracy. In this research, we propose a significantly accurate but computationally efficient methodology using simple models from Machine Learning. We employ Stack Ensembling using two different data manipulations and compare their results with selected baseline predictors as well as existing literature. We only use timestamp information for feature extraction to keep this study independent of non-time features. We achieve a maximum improvement of 34.98% in terms of mean absolute percentage error (MAPE) over chosen base predictor ARIMA (Auto Regressive Integrated Moving Average). Beside these improved results, limitations of our work include a low degree of accuracy for outliers' estimation in the electric load consumption on which we plan to improve upon in future work.
    Keywords: ARIMA; calendar features; ensemble learning; intelligent data manipulation; load prediction; machine learning; planning.

  • Depression Detection Using Semantic Representation based Semi-Supervised Deep Learning   Order a copy of this article
    by Gaurav Kumar Gupta, Dilip Kumar Sharma 
    Abstract: Depression detection has become an arduous task in social media due to its complicated association with mental disorders. This work focuses on extracting the depressive features in the social network from the unstructured and structured data through the Semantic representation and Semi-supervised Deep learning model for Depression detection (SSDD). The proposed approach primarily performs the hybrid features analysis, unsupervised learning-based depression-influencing features representation, and supervised learning-based depressed user detection processes. Initially, the SSDD investigates the different demographic and content-based features from syntactic and semantic relations. Secondly, adopting the deep autoencoder as the unsupervised learning model leverages the extraction of the depression-indicative features representing the texts with the word embedding. Finally, it determines the depressive texts using the Bi-directional Long Short-Term Memory (Bi-LSTM) model and facilitates the detection of depressed social users by analyzing the profile features, detected depressive tweets, and hybrid knowledge. The experimental results outperform the existing depression detection model.
    Keywords: Twitter; Semi-supervised; Hybrid knowledge; Semantic; Negation; Depression-indicative; Deep Autoencoder; and Bi-LSTM.

    by Shitharth S, Hariprasath Manoharan, Lakshmi Narayanan, Takkedu Malathi S., S. Vatchala, Kommu Gangadhara Rao 
    Abstract: In the process of urban environment, the optimisation of network enactment is shifted from operation to maintenance and monitoring stage. During such conversion it is necessary to indicate the time series representation for preventing the overexploitation problem that happens due to more number of natural resources. It is necessary to use a set of historical data to check the behaviour of current state operations at varying time periods using an intelligent optimiser. Thus this study explores the implementation of time series analysis using artificial intelligence (AI) where accurate predictions are made in the entire urban environment even with big edifices. The major difference that is observed in the proposed method as compared to existing method is that two different boundary regions are chosen with distinct point values and only in two directions the monitoring device is installed. Since AI is involved in the entire process entire characteristics on forecasting current state procedure is represented using modified evolutionary optimisation (MEO) which observes entire biological nature of neighbouring environs. Additionally comparison analysis is made using MATLAB with five case studies where the proposed method proves to be much effective for about 70 percentage as compared to existing models.
    Keywords: time series; urban environment; artificial intelligence; AI; forecast.
    DOI: 10.1504/IJDATS.2022.10053183