One of the biggest problems with social media is spammers who spam adult content. Detecting and removing such content quickly is essential to keep social media clean.
Researchers from Jamia milia university have described how the user experience of young people can be improved if the spam content is filtered. Various machine learning tools can be used to detect such content and classify them as spam. Of the different models they tried, they found XG Boost to be the one with the highest accuracy at 91% and adapted the algorithm for effective classification. False positives were less than 10% of the positive ones. The features they used to analyze are the entropy of words, linguistic diversity, and word embeddings.
Identification comes naturally from the fact that regular site users generally talk about a wide range of subjects in various settings, publish, and share content in a way that may be described as natural. Contrarily, spammers and pornographic spammers in this scenario typically adopt a set or even totally automated approach to their updates, as well as a relatively constrained vocabulary and a limited range of topics. The system can identify spam messages based on these and other traits.