The news at this end is that
Frank Bauer and I just submitted a new preprint on arXiv:
F. Bauer and J.T. Lizier, "Identifying influential spreaders and efficiently estimating the number of infections in epidemic models: a path counting approach", MPI MIS Preprint 1/2012,
arXiv:1203.0502, 2012.
We introduce a new method to efficiently approximate the number of infections resulting from a given initially-infected node in a network of susceptible individuals, based on counting the number of possible infection paths of various lengths to each other node in the network. We analytically study the properties of our method systematically, in particular demonstrating different forms for SIS and SIR disease spreading (e.g. under the SIR model our method counts self-avoiding walks). In comparison to existing methods to infer the spreading efficiency of different nodes in the network (based on degree, k-shell decomposition analysis and different centrality measures), our method directly considers the spreading process, and as such is unique in providing estimation of actual numbers of infections. Crucially, in simulating infections on various real-world networks with the SIR model, we show that our walks-based method improves the inference of effectiveness of nodes over a wide range of infection rates compared to existing methods. We also analyse the trade-off between estimate accuracy and computational cost of our method, showing that the better accuracy here can still be obtained at a comparable computational cost to other methods.
The problem that we are addressing here is two-fold, and applicable not just to disease/epidemic spreading but also other information spreading models (e.g. rumour spreading):
- How to efficiently estimate the number of infections resulting from a given initially-infected node in a network of susceptible individuals?
- What network structural properties local to the initially-infected node are most useful for prediction of its spreading efficiency?
The most obvious and direct way to estimate the number of infections is to simply run simulations of the infection model - the big issue with this however, is that it takes a lot of computational time. For example, for an SIR (susceptible-infected-removed) model we run on a network of 1133 nodes and 5451 undirected edges, 1000 simulated initial infections per node for around 20 values of infection rate β took 50 minutes to simulate. And of course, this runtime does not scale well with the size of the network. So this gives us problem 1.
As such, there has been a lot of work recently trying to find local network properties of nodes that are useful in predicting the relative spreading efficiency of different initially-infected nodes in a network. This attempts to address problem 1, but additionally gives very useful insight into how local network structure can promote or inhibit disease spreading (i.e. problem 2). The properties other authors have investigated range from simply examining out-degree, to
k-shell analysis, to various measures of node centrality in the network (e.g.
eigenvector centrality). And the good news is that you get a surprisingly accurate insight into the relative spreading efficiency of the various nodes. Such work has attracted a lot of attention, for example being published in
Nature Physics.
However, we observed two issues with these approaches:
- They only infer the relative spreading efficiency of initially infected nodes; i.e. they do not provide an estimate of the actual numbers of infections resulting from each node. These actual infection numbers could be very important in many applications.
- While the existing inference measures do a good job, they do not actually directly examine the mechanics of disease spreading. As such, there is still room for improvement. As an example of potential improvement areas: none of the above-mentioned measures change their relative inference with the rate of infection β.
So, we sat down and thought hard about the local network properties that best relate to the mechanics of disease spreading. From our perspective, disease spreads on a network in the manner of a walk. The disease can only reach node B from node A on a walk from A to B, where every node on that walk is also infected. Our basic idea is that the count of the number of walks from the initially infected node to other susceptible nodes (an approach known as
path counting) should be a local network structural property that gives good insight into disease spreading.
We developed the idea further, working out the mathematics to turn these path counts into estimates of infection numbers, as a function of spreading rate β. Interestingly, different types of walks are involved for different disease spreading models; e.g. for SIR spreading one is only interested in
self-avoiding walks (since no node can be infected twice), whereas for SIS (susceptible/infected/susceptible) spreading one is interested in any type of walk. Our estimates are not perfect, and we identify where and how known errors will occur. However, the estimates have several very useful properties compared to other approaches:
- Our technique provides estimates of actual numbers of infections from a given initially-infected node, which is more useful than inferred relative spreading efficiency alone.
- Importantly, testing against simulations on social network structures reveals that the relative spreading efficiencies inferred by our technique are more accurate than those of the aforementioned previously published techniques. This is because our technique directly considers the mechanics of disease spreading.
- And our technique has comparable computationally efficiency with the aforementioned inference techniques - taking ~3 seconds for the network mentioned above whose full simulations took 50 minutes. (There is a trade-off in our technique between computational efficiency and higher accuracy, by considering long path lengths, however the better accuracy referred to above is achievable at the path lengths computable in 3 seconds.)
As such, we show that
path counts provide the following
unique combination of features: they are the most relevant local network structural feature to infer relative disease spreading efficiency, provide estimates of actual infection numbers, and are computationally efficient.
Comments/suggestions welcome, of course ...