LDA based approach for predicting friendship links in live journal social network
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The idea of socializing with other people of different backgrounds and cultures excites the web surfers. Today, there are hundreds of Social Networking sites on the web with millions of users connected with relationships such as "friend", "follow", "fan", forming a huge graph structure. The amount of data associated with the users in these Social Networking sites has resulted in opportunities for interesting data mining problems including friendship link and interest predictions, tag recommendations among others. In this work, we consider the friendship link prediction problem and study a topic modeling approach to this problem. Topic models are among the most effective approaches to latent topic analysis and mining of text data. In particular, Probabilistic Topic models are based upon the idea that documents can be seen as mixtures of topics and topics can be seen as mixtures of words. Latent Dirichlet Allocation (LDA) is one such probabilistic model which is generative in nature and is used for collections of discrete data such as text corpora. For our link prediction problem, users in the dataset are treated as "documents" and their interests as the document contents. The topic probabilities obtained by modeling users and interests using LDA provide an explicit representation for each user. User pairs are treated as examples and are represented using a feature vector constructed from the topic probabilities obtained with LDA. This vector will only capture information contained in the interests expressed by the users. Another important source of information that is relevant to the link prediction task is given by the graph structure of the social network. Our assumption is that a user "A" might be a friend of user "B" if a) users "A" and "B" have common or similar interests b) users "A" and "B" have some common friends. While capturing similarity between interests is taken care by the topic modeling technique, we use the graph structure to find common friends. In the past, the graph structure underlying the network has proven to be a trustworthy source of information for predicting friendship links. We present a comparison of predictions from feature sets constructed using topic probabilities and the link graph separately, with a feature set constructed using both topic probabilities and link graph.