Title: A paragraph-inserted word salad filtering algorithm

Authors: Ok-Ran Jeong; Won Kim

Addresses: Department of Software Design and Management, Gachon University, 1342 SeongnamDaero, Sujeong-Gu, Seongnam-Si, Gyeonggi-Do, 461-701, Republic of Korea ' Department of Software Design and Management, Gachon University, 1342 SeongnamDaero, Sujeong-Gu, Seongnam-Si, Gyeonggi-Do, 461-701, Republic of Korea

Abstract: Social spam is one type of spam which includes spamming the members of social websites by sending or posting unwanted ads or baiting them to visit particular websites. Word salad in turn is one type of social spam which aims at baiting people to visit particular websites, such as blogs, personal profiles, third-party applications built on social networking websites, etc. A word salad is created by inserting either words or paragraphs within a normal document, where the inserted words or paragraphs have no relevance to the document. The purpose of a word salad is to fool the search engines into assigning high ranks to the document. In this paper, we discuss an algorithm that filters (detects) paragraph-inserted word salads. The algorithm is based on the Singular Value Decomposition (SVD) method and, based on experiments, shows up to 81.3% accuracy.

Keywords: social spam; spam filtering; word salad filtering; social networking; singular value decomposition; SVD.

DOI: 10.1504/IJWGS.2012.046730

International Journal of Web and Grid Services, 2012 Vol.8 No.1, pp.56 - 71

Published online: 31 Dec 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article