Title: Dimensionality reduction framework for blog mining and visualisation
Authors: Flora S. Tsai
Addresses: School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore
Abstract: The growing abundance of blogs and new forms of social media has created a critical need for new technologies to transfer the digital realm of social media into a manageable form. Blog mining addresses the domain-specific problem of mining information from blog data. Although mining blogs may share many similarities to web and text documents, existing data mining techniques need to be reevaluated and adapted for the multidimensional representation of blog data, which exhibit dimensions not present in traditional documents. In this paper, a new approach is presented for blog mining and visualisation based on dimensionality reduction techniques. The author-topic model based on latent Dirichlet allocation was extended for analysing and visualising blog authors, links, and time. A framework based on dimensionality reduction is proposed to visualise the blog dimensions of content, tags, authors, links, and time. This framework has been successfully designed, implemented, and evaluated on real-world blog data.
Keywords: blog mining; dimensionality reduction; visualisation; multidimensional scaling; MDS; isometric feature mapping; Isomap; locally linear embedding; LLE; latent Dirichlet allocation; LDA; blogs; blogging; weblogs; data mining; blog data.
DOI: 10.1504/IJDMMM.2012.048108
International Journal of Data Mining, Modelling and Management, 2012 Vol.4 No.3, pp.267 - 285
Published online: 23 Aug 2014 *
Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article