Title: Employment modelling through classification and regression trees

Authors: Anton Gerunov

Addresses: Faculty of Economics and Business Administration, Sofia University, 'St. Kliment Ohridski', 125 Tsarigradsko Shosse Blvd., 1115 Sofia, Bulgaria

Abstract: The research paper leverages a big dataset from the field of social sciences - the combined World Values Survey 1981-2014 data - to investigate what determines an individual's employment status. We propose an approach to model this by first reducing data dimensionality at a small informational loss and then fitting a number of alternative machine-learning algorithms. A decision tree and a random forest model are studied in more detail. Variable importance is investigated to glean insight into what determines employment status. Employment is explained through traditional demographic and work attitude variables but unemployment is not, meaning that the latter is likely driven by other factors, including structural labour market characteristics and even randomness. The main contribution of this paper is to outline a new approach for doing big data-driven research in labour economics and apply it to a dataset that was not previously investigated in its entirety, thus achieving a more sophisticated process understanding.

Keywords: labour market; unemployment; big data; employment modelling; classification; regression trees; CART; random forest; social sciences; employment status; demographics; work attitudes; labour economics.

DOI: 10.1504/IJDS.2016.081368

International Journal of Data Science, 2016 Vol.1 No.4, pp.316 - 329

Published online: 06 Jan 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article