Stochastic blockmodel
Stochastic blockmodels are a kind of statistical model for networks based on partitioning the nodes into clusters and describing the distribution of links between any two clusters.
Visone supports the sampling of networks from some kinds of stochastic blockmodels. The user interface makes the sampler available in two ways:
- In the network creation dialog accessible via the file menu, where the stochastic blockmodel can be specified by supplying a matrix of (manually entered) edge weights.
- In the transformation tab under the name "SBM sampling", where the blockmodel is specified by the currently active network and its attributes.
Sampling in the network creation dialog
In the network creation dialog, the following settings are offered to specify the sampling from a stochastic blockmodel:
- Cluster count sets the number of clusters in the blockmodel.
- Model sets the underlying type of statistical model/distribution used for sampling the edges in the block
- Directed sets whether the blockmodel is used to sample a directed or undirected network.
- Loops sets whether the sampled network may contain loops or not.
- Cluster sizes specifies the number of vertices for each individual cluster. If the cluster size for a specific cluster is not set explicitly, then the default cluster size is used instead.
- Block weights specifies the weights used to parameterize the underlying statistical model/distribution. The interpretation of this weight depends on the specific statistical model (see the list of models below). If the weight for a specific block is not explicitly set, then the default block weight is used instead.
After pressing "apply", a new network generated from the specified blockmodel is made available in a new tab.
Sampling in the transformation tab
In the transformation tab, the stochastic blockmodel is derived from the currently active network. Each node in the network describes a cluster, and each edge describes a block. The edge direction determines the direction of the edges sampled from it. Note that the currently active network can have multiple edges between two nodes describing clusters; in this case, the sampling from these edges proceeds independently.
Besides the network itself, the following settings are offered to specify the sampline:
- Model determines the underlying type of statistical model/distribution used for sampling the edges in the newly generated network.
- Cluster sizes is an attribute on the nodes in the active network and determines how many nodes are generated for each cluster.
- Cluster labels is an attribute on the nodes in the active network and is used to create a node attribute cluster in the newly generated network to label the new nodes according to their cluster membership. If no node attribute is specified, then sequential labels are used instead.
- Edge labels is an attribute on the edges in the active network and is used to create an edge attribute block in the newly generated network to label the new edges according to which block they were generated from. If no edge attribute is specified, then sequential labels are used instead.
- Block weights are weights on the blocks/edges in the active network to parameterize the statistical model. The interpretation of this weight depends on the specific statistical model (see the list of models below).
- Loops sets whether self-loops are allowed in the generated network. Loops can be allowed or forbidden for all blocks, or a Boolean edge attribute can be used to set this for each block individually.
After pressing "transform", a new network generated from the specified blockmodel is made available in a new tab.
Statistical models
Assume we are sampling the edges in a block from cluster to with block weight . Currently, visone supports sampling of the edges in the blocks from the following distributions:
- Bernoulli: Edges between different pairs of vertices are drawn independently. The block weight specifies the probability that a specific edge exists or not, i.e., each edge is sampled with probability .
- Geometric: Edges between different pairs of vertices are drawn independently. The block weight specifies the expected number of parallel edges between any two nodes in the clusters. Parallel edges are sampled from a Geometric distribution: parallel edges from a vertex in to a vertex in have probability .
- Poisson: Edges between different pairs of vertices are drawn independently. The block weight specifies the expected number of parallel edges between any two nodes in the clusters. Parallel edges are sampled from a Poisson distribution: parallel edges from a vertex in to a vertex in have probability .
- Configuration model (or random sequence model): The block weight specifies the number of edges within the block. The edges are drawn in line with the random sequence model introduced in Bollobás, B., and Frieze, A.M. (1985). On matchings and Hamiltonian cycles in random graphs. Annals of Discrete Mathematics 28:23–46. and Chvátal, V. (1991). Almost all graphs with 1.44n edges are 3-colorable. Random Structures & Algorithms 2(1):11–28.
The model works as follows: A sequence of ordered pairs of vertices is drawn. For each pair, an edge is created, where the first vertex defines the source and the second the target of the edge.
Assume that matrix describes the number of parallel edges from vertices in to vertices in in the block and the number of edges in the block is . The statistical model translates into the following probabilities (note that in all cases, the probability scales with factor for an edge with multiplicity ):- If the block is off the diagonal or directed with loops allowed, then the probability is .
- If the block is directed, on the diagonal and loops are forbidden, then the probability is .
- If the block is undirected, on the diagonal and loops are allowed, then the probability is for some ordering on the vertices. Note that this means that loops are half as likely as non-loops.
- If the block is undirected, on the diagonal and loops are forbidden, then the probability is for some strict ordering on the vertices.
- Configuration model (no multi-edges): This is like the multigraph model above, but with the difference that multi-edges are not allowed. This means that it is equivalent to the Erdős–Rényi model if loops are not permitted.
Time complexity
A network is generated in time linear in the size of the sampled network.