Sifting Through Social Noise
Recent years have radically changed the way people socialize; in parts of the developed world that have good broadband and cellular penetration, the average person now spends more time on online social networks than on physical meetings with acquaintances outside their immediate family.
“We were socializing before social networks,” Prof. Sujay Sanghavi states. “But now we can automatically capture fine details of social interaction, such as when someone views a photo and how they share and interact with the image.”
The problem now is the issue of information overload.
“If 1,000 friends on Facebook all post something, not all of it is relevant to you. Being shown everything is pretty useless and degrades your social experience,” Prof. Sanghavi mentions. This is where his research team steps in, using the fine-grained recording of user interactions to better select the most relevant people and information for any given person. Using a series of algorithms, Prof. Sanghavi and team have explored three major ways to bring sense to large-scale social networks.
The first problem in social networks is to find communities – sets of users that share denser connections amongst themselves as compared to with the general populace.
Sociology recognizes that connections that form part of a community are often more predictive of user tastes and future connections. But finding these communities in large graphs is both computationally and statistically challenging.
The WNCG team developed a new graph-clustering technique with widespread applications affecting community detection, user-profiling, link prediction and collaborative filtering. By posing graph clustering as a convex optimization problem, Sanghavi and his team were able to both reach globally optimal solutions with better statistical properties and provide an algorithm that easily scales and parallelizes.
The second problem Prof. Sanghavi’s team works on is the task of finding the social graph from user behavior. In particular, the social graph is valuable, but is often proprietary and not accessible; individual user actions however may be less so. For example, a network provider will have access to user activity levels, but not the social graph. The research of Sanghavi’s team enables the learning of a social graph from data on user behavior, even when it is not clear who those users are reacting to.
The core idea of this research is to study cascades, which are actions like sharing, liking and buying that spread through the network because people follow the lead of relevant friends. Sanghavi’s team developed new ways to find the graph on which cascades spread, when only information about who participated in the cascade, but not why they participated, is available.
One example of such a cascade is when users switch cellphone providers, which is a common occurrence outside the USA, based on the switching patterns of their friends. Inferring the underlying network of such influences is highly valuable.
The research does not strictly impact social media sites such as Facebook or Twitter, but reaches into the fields of e-commerce, security and peer-to-peer networks. By determining and predicting how information spreads on a large scale, influential nodes can be targeted, which are useful to viral marketing and predicting top-sellers on social buying sites such as Groupon. This research also has its mathematical roots in human disease propagation, as well as the propagation of social networks.
The third major challenge facing social networks today is the automated assigning of topics to content, according to WNCG Profs. Sujay Sanghavi and Sanjay Shakkottai. There is now a vast corpus of uncategorized images and videos, both public and private, on social networks. Finding out topics like “outdoors,” “kids,” “water” and others would make the information we see and search for more meaningful.
The current method for automatic topic modeling employed by search engines and marketing strategists is based on content like news articles.
“Topic modeling for news articles is easier because you have access to words, which tell you topics. It’s much harder with pictures, because we don’t have words,” Prof. Sanghavi states.
Computer vision by itself, according to Prof. Sanghavi, is not complex enough to handle the task of categorizing these visual elements into specific topics.
“Standard image processing will extract items and those items will look the same for any two people, whether it’s me or President Obama. We will look very much the same, as long as we are both people in a certain background. With image processing it’s hard to get features indicative of underlying semantic topics,” Prof. Sanghavi mentions.
Profs. Sanghavi and Shakkottai are working on a solution that circumvents the need for words and uses people’s reactions to and interactions with pictures and videos to infer content topics accordingly. For example, if someone is interested in sports and they share a football video that is later shared and commented on primarily by other sports fans, this new system of topic modeling can infer the video is about sports.
The solution, according to Sanghavi and Shakkottai, is two-fold. The pair developed an algorithm that determines first what topics a user is interested in and second, what topics are in the content a user is viewing, though neither answer is initially known.
“The idea is that we don’t try to get indicative features directly from the content,” Prof. Sanghavi states. “Instead, we use the fact that many people see these pictures and react to them. At a high level, the idea is to use the user as a word; her interest in a video is mathematically just as indicative of underlying semantics as the presence of a word in a news article.”
WNCG’s research has the potential to revolutionize the current search engine system. Instead of searching for keywords based on count profiles, users can search for more general topics. The research team’s output could be a key component for social search and finding content generated by a user’s friends and acquaintances.