Detecting communities and performing statistical inferences on networks through renewal non-backtracking random walks

Date

2019-08-01,

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In many sciences---for example Sociology, Biology, and Computer Science---units under study often belong to communities, and units within the same community behave similarly. A way to identify communities may be through how units interact with each other; units within a community may be more likely to interact with each other than units across different communities. This is equivalent to viewing units under study as a network, where nodes are units and edges are drawn between two units if they interact with each other. Hence, a critical problem in these sciences is how to identify communities given a mathematical network. Since members within the same community are more likely to interact with each other, it may follow that cycles may be more prevalent within communities than across communities. Thus, the detection of these communities may be aided through the use of measures of the local ``richness'' of cyclic structures. In this dissertation, we develop the renewal non-backtracking random walk (RNBRW)--- a variant of a random walk in which the walk is prohibited from returning back to a node in exactly two steps and terminates and restarts once it completes a loop---as a way of quantifying this cyclic structure. Specifically, we propose using the retracing probability of an edge---the likelihood that the edge completes a cycle in a RNBRW---as a way of quantifying cyclic structure. Intuitively, edges with larger retracing probabilities should be more important to the formation of cycles, and hence, to the detection of communities.
We show that retracing probabilities can be estimated efficiently through repeated iterations of RNBRW. Additionally, since RNBRW runs can be performed in parallel, accurate estimation can be obtained even when the network contains millions of nodes. We give simulation results that suggest pre-weighting edges through RNBRW can improve the performance of popular community detection algorithms substantially. This improvement is most significant when the network is sparse. Moreover, the performance of community detection algorithms with this weighting are competitive to other scalable methods that do not allow for weighting of edges. We also develop a goodness-of-fit test to help determine whether communities exist within a network. We begin with a network on n nodes that follows an unknown random graph model. We test the null hypothesis that the network is a realization of an Erdös-Renyi graph---a random graph in which each edge is equally likely to be formed, and hence, contains no inherent community structure. Rejecting this null implies that the network comes from a distribution with inherent community structure (for example, a planted partition model). To perform our test, we form an n×n matrix where entries are the retracing probabilities (estimated through RNBRW) of the corresponding edges of the network. We use as our test statistic a scaled version of the largest eigenvalue of this matrix. We perform a simulation study to compare the Type I Error probability and power of our method to that of other spectral approaches for network inference. We conclude by describing connections between RNBRW and the maximum expected cyclic overlap problem and giving theoretical results of RNBRW under the stochastic block model.

Description

Keywords

Non-backtracking random walk, Community detection, Network, Statistical inference, Scalable

Graduation Month

August

Degree

Doctor of Philosophy

Department

Department of Statistics

Major Professor

Michael J. Higgins

Date

2019

Type

Dissertation

Citation