Towards a Theory for Network Clustering
Community detection in networks is a key exploratory tool with applications ranging from identifying protein complexes in protein-protein interaction networks to scaling network computation by divide and conquer-type algorithms. Many clustering algorithms make heuristic choices which work well in practice, but their theoretical underpinnings are not well understood. For example, it is commonly believed that normalizing the network adjacency matrix improves the performance of Spectral Clustering. In the first part of this talk I will show that under a broad parameter regime, this is indeed true.
In the second part of the talk, I will focus on another heuristic which determines the number of communities in a network from the knee of the eigen-spectrum of the adjacency matrix. I will present a hypothesis testing framework that formalizes this heuristic and can be used for automatically determining the number of clusters in a network. Our analyses are in the context of a Stochastic Blockmodel, which is a widely used model for generating labeled networks.