Community Detection
Goal: find densely linked clusters.
Strength of Weak Ties
If the number of hops between two nodes is large, the connection is likely to be a weak tie.
Edge betweenness: number of shortest paths that contain that edge. If there are multiple shortest paths, take the fraction of shortest paths that contain the edge.
Girvan-Newman
Find the link with the highest betweenness factor and remove it.
By repeating, the network can be decomposed in a hierarchical manner.
Computing Betweenness
Pick starting node
This can be done by summing the values of the node’s parents. Set the root node,
Store information about links between edges on the same level (hops from
a ----
/ | \ \
/ | \ \
b1 - c1 d1 e1 level 1 (1 hop from a)
\ / / \ /
\/ / \/
f2 g1 h2 level 2
\ /\ /
\ / \ /
i3 j3 level 3
\ /
\/
k6 level 4
Then, compute betweenness by working up the tree using BFS following these rules:
- Each leaf node has a node flow of
- Each node gives splits its node flow among the edges to its parents - the edge betweenness:
- The betweenness value is proportional to the number of shortest paths the parent has
- e.g. If node
has a flow of and it has two parents, and with node flows of and respectively, and
- All other nodes have a node flow of
plus the sum of edge betweenness from the edges to its children
In the above example:
-
has a node flow of -
and have the same node weight, so ’s node flow is divided equally: -
has a node flow of - The ratio of
to is , so and
- The ratio of
Hence, the resultant graph is:
a ----
2/2| \4 2\
/ | \ \
2b - 2c 4d 2e
1\ 1/ 2/ \1 /1
\/ / \/
2f 2g 2h
1\ .5/\.5 /1
\ / \ /
1.5i 1.5j
\ /
0.5\/0.5
1k
This must be repeated, picking every node as the root node. The betweenness value for each edge can be summed and then divided by 2 (as each shortest path discovered twice from each end).
In order to improve performance, a random subset of nodes can be used as the root node to get an approximation.
A hierarchical network decomposition can be made by using the order in which the graph was divided.
Network Communities
Community detection requires not just computing betweenness, but also determining when to stop dividing clusters; otherwise, each cluster would be left with a single node.
Network communities are sets of tightly connected nodes.
Modularity
If a network is partitioned into groups,
But how is the number of expected edges found?
Null Model
Given a real
If
That is, the sum of degrees of all nodes is twice the number of edges. Hence, in the entire graph,
$$
\sum_{u \in N}{k_u} = 2m
$$
The expected number of edges in the multigraph
And the expected degree - number of edges per node, for the entire graph is
Given a partitioning
Where
A positive value means the number of edges within the groups exceeds the expected number.
If
Modularity is initially 0; with only one community, no edges have been removed and the expected number of edges is derived from this case. In nice cases, there will be an inflection point at which maximum modularity is reached.