Modeling and projection of respondent driven network samples



Journal Title

Journal ISSN

Volume Title



The term network has become part of our everyday vocabulary. The more popular are perhaps the social ones, but the concept also includes business partnerships, literature citations, biological networks, among others. Formally, networks are defined as sets of items and their connections. Often modeled as the mathematic object known as a graph, networks have been studied extensively for several years, and research is widely available. In statistics, a variety of modeling techniques and statistical terms have been developed to analyze them and predict individual behaviors. Specifically, certain statistics like degree distribution, clustering coefficient, and so on are considered important indicators in traditional social network studies. However, while conventional network models assume that the whole network population is known, complete information is not always available. Thus, different sampling methods are often required when the population data is inaccessible. Less time has been dedicated to studying the accuracy of these sampling methods to produce a representative sample. As such, the aim of this report is to identify the capacity of sampling techniques to reflect the features of the original network. In particular, we study Anti-cluster Respondent Driven Sampling (AC-RDS). We also explore whether standard modeling techniques paired with sample data could estimate statistics often used in the study of social networks. Respondent Driven Sampling (RDS) is a chain referral approach to study rare and/or hidden populations. Originating from the link-tracing design, RDS has been further developed into a series of methods utilized in social network studies, such as locating target populations or estimating the number and proportion of needle-sharing among drug addicts. However, RDS does not always perform as well as expected. When the social network contains tight communities (or clusters) with few connections between them, traditional RDS tends to oversample one community, introducing bias. AC-RDS is a special Markov chain process that collects samples across communities, capturing the whole network. With special referral requests, the initial seeds are more likely to refer to the individuals that are outside their communities. In this report, we fitted the Exponential Random Graph Model (ERGM) and a Stochastic Block Model (SBM) to an empirical study of the Facebook friendship network of 1034 participants. Then, given our goal of identifying techniques that will produce a representative sample, we decided to compare two version of AC-RDSs, in addition to traditional RDS, with Simple Random Sampling (SRS). We compared the methods by drawing 100 network samples using each sampling technique, then fitting an SBM to each sample network we used the results to project the network into one of population size. We calculated essential network statistics, such as degree distribution, of each sampling method and then compared the result to the original network observed statistics.



Networks, Respondent Driven Sampling, Nonparametric Bayesian, Sampling Methods, Stochastic Blockmodel

Graduation Month



Master of Science


Department of Statistics

Major Professor

Perla E. Reyes Cuellar