Weighting schemes based on EM algorithm for LDA Online publication date: Fri, 17-Aug-2018
by Yaya Ju; Jianfeng Yan; Zhiqiang Liu; Lu Yang
International Journal of High Performance Systems Architecture (IJHPSA), Vol. 8, No. 1/2, 2018
Abstract: Latent Dirichlet allocation (LDA) is a popular probabilistic topic modelling method, which automatically finds latent topics from a corpus. LDA users often encounter two major problems: first, LDA treats each word equally, and common words tend to scatter across almost all topics without reason, thereby leading to bad topic interpretability, consistency, and overlap. Second, an appropriate way to distinguish low-dimensional topic features for better classification performance is lacking. To overcome these two shortcomings, we propose two novel weighting schemes: a word-weighted scheme, which is realised by introducing a weight factor during the iterative process, and a topic-weighted scheme, which is realised by combining the Jenson-Shannon (JS) distance and the entropy of the generated low-dimensional topic features as a weight coefficient, using expectation-maximisation (EM). Experimental results show that the word-weighted scheme can find better topics for improving the clustering performance effectively, and the topic-weighted scheme has a larger effect on text classification than traditional methods.
Online publication date: Fri, 17-Aug-2018
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of High Performance Systems Architecture (IJHPSA):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email firstname.lastname@example.org