Authors: Chun-Hsiung Tseng; Yung-Hui Chen; Yan-Ru Jiang
Addresses: Department of Communications Engineering, Yuan Ze University, Taoyuan City, Taiwan ' Department of Computer Information and Network Engineering, Lunghwa University of Science and Technology, Guishan District, Taoyuan City, 33306, Taiwan ' Department of Information Management, Nanhua University, Taiwan
Abstract: Containing a huge amount of data, the web is undoubtedly a very good source of information. However, performing analysis against data fetched from the web is not an easy task. First, the web is designed to be document-centric rather than data-centric. The former refers to websites that are designed for presenting documents only while the latter refers to websites that are designed for rendering datasets. As a result, reading data shown on web pages is comfortable but collecting data is difficult. Imagine repeating the copy-paste procedure for thousands of web pages. Second, the diversity of the presentation style of web pages makes data normalisation essential but difficult. Last but not the least, data analysis itself demands high statistics skill and sometimes may even require domain expertise. In this research, the researchers would like to address these issues by designing a data analysis tool for the web.
Keywords: information extraction; web data mining; Big Data.
International Journal of Social and Humanistic Computing, 2017 Vol.2 No.3/4, pp.150 - 165
Available online: 19 Jun 2017 *Full-text access for editors Access for subscribers Purchase this article Comment on this article