Saturday, March 17, 2012

Workshop on Non-linear Interdependence Measures in Neuroscience

I'm pleased to announce the NeFF-Workshop on Non-linear and model-free Interdependence Measures in Neuroscience and TRENTOOL course which will be held at Goethe University Frankfurt, Germany on April 26-27, 2012 (hosted by the MEG Unit of the Brain Imaging Centre, Frankfurt).

The synopsis from the workshop announcement is as follows:
Understanding complex systems composed of many interacting units, such as neural networks, means understanding their directed and causal interactions. If the units in question interact in a nonlinear way, as it can be assumed in neural networks, we are faced with the problem that the analysis of interactions must be blind to the type of interaction if we want to cover all possible interactions in the network, as we may not know the type of nonlinear interaction a priori. Prematurely limiting our search to specific models, nonlinearities or, even worse, linear interactions may block the road to discovery. Novel model-free techniques for the quantification of directed interactions from information theory offer a promising alternative to more traditional methods in the field of interaction analyses, but also come with their own specific challenges. This symposium brings together the most active researchers in the field to discuss the state of the art, future prospects and challenges on the way to an model-free, information theoretic assessment of neuronal directed interactions.
I'm happy to be co-organising this workshop with Michael Wibral (head of the MEG Unit, Brain Imaging Center, Goethe University Frankfurt) and Raul Vicente  (Frankfurt Institute for Advanced Studies).

We've got several speakers lined up to talk about their work in this field, particularly using information-theoretic tools including the transfer entropy. The speakers include some collaborators of mine (e.g. Mikhail Prokopenko, Paul Williams), many others I'm looking forward to meeting (e.g. Stefano Panzeri, Luca Faes), and the organisers of course :).

Plus there will also be a workshop on Michael and Raul's Transfer Entropy toolbox (TRENTOOL), which is designed to provide effective network analysis on neuro data sets in Matlab. I'm looking forward to playing around with this more myself, I've already got a project and some data in mind.

We're hoping to get lots of participants (though space is limited) - full details on how to register are available at the workshop website -

I hope to see you there!

Sunday, March 4, 2012

Identifying influential spreaders and efficiently estimating infection numbers in epidemic models: a walk counting approach

Updated 26 October - our paper has been published in Europhysics Letters 99 68007 (2012) doi:10.1209/0295-5075/99/68007 - we've also updated the preprint on arXiv with the revised material.

Update 3 December - MPI made a press release about the paper, and New Scientist (German edition) published an article about it (in German).


The news at this end is that Frank Bauer and I just submitted a new preprint on arXiv:

F. Bauer and J.T. Lizier, "Identifying influential spreaders and efficiently estimating infection numbers in epidemic models: a walk counting approach", MPI MIS Preprint 1/2012, arXiv:1203.0502, 2012.

We introduce a new method to efficiently approximate the number of infections resulting from a given initially-infected node in a network of susceptible individuals. Our approach is based on counting the number of possible infection walks of various lengths to each other node in the network. We analytically study the properties of our method, in particular demonstrating different forms for SIS and SIR disease spreading (e.g. under the SIR model our method counts self-avoiding walks). In comparison to existing methods to infer the spreading efficiency of different nodes in the network (based on degree, k-shell decomposition analysis and different centrality measures), our method directly considers the spreading process and, as such, is unique in providing estimation of actual numbers of infections. Crucially, in simulating infections on various real-world networks with the SIR model, we show that our walks-based method improves the inference of effectiveness of nodes over a wide range of infection rates compared to existing methods. We also analyse the trade-off between estimate accuracy and computational cost, showing that the better accuracy here can still be obtained at a comparable computational cost to other methods.

Epidemic spreading in biological, social, and technological networks has recently attracted much attention. The structure of such networks is generally complex and heterogeneous, so a key question in this domain is: "Given a first infected individual of the network (patient zero) - how likely is it that a substantial part of the network will become infected?" It is, of course, of particular interest to identify the most influential spreaders. This knowledge could, for instance, be used to prioritise vaccinations.

The most obvious and direct way to address this question is to estimate the number of infections by running simulations of the infection model. There are several well-known infection models which can be used to simulate diseases with different properties. These include the SIR (susceptible-infected-removed) model for diseases where a subject may only be infected once (due to either recovery with full immunity or death), and the SIS (susceptible-infected-susceptible) model for diseases where infected subjects recover and become susceptible to reinfection. To run a simulation, one must consider the network structure connecting susceptible individuals, and the infection rate or probability β that an infected individual will infect a given neighbour.

One problem with running simulations however, is that it takes a lot of computational time to obtain appropriate accuracy. For example, for an SIR model we run on a network of 27519 nodes and 116181 undirected edges, 10000 simulated initial infections per node for around 20 values of infection rate β took 2000 hours to simulate. Of course, this runtime does not scale well with the size of the network, while for real-world problems it is generally the large networks that we are genuinely interested in.

As such, it would be useful to find a more efficient way than full simulation to estimate infection numbers, and/or to be able to use local network properties of nodes to understand their spreading efficiencies.
So the problem that we are addressing here is two-fold:
  1. How to efficiently estimate the number of infections resulting from a given initially-infected node in a network of susceptible individuals?
  2. What network structural properties which are local to the initially-infected node are most useful for predicting how well disease will spread from it?
In fact, there has been a lot of work recently trying to find local network properties of nodes that are useful in predicting the relative spreading influence of different initially-infected nodes in a network. This attempts to address problem 1, but additionally gives very useful insight into how local network structure can promote or inhibit disease spreading (i.e. problem 2). The properties other authors have investigated range from simply examining out-degree, to k-shell analysis, to various measures of node centrality in the network (e.g. eigenvector centrality). And the good news is that you get a surprisingly accurate insight into the relative spreading efficiency of the various nodes with these very simple measures. Such work has attracted a lot of attention, for example being published in Nature Physics.

However, we observed two issues with these approaches:
  1. They only infer the relative spreading efficiency of initially infected nodes; i.e. they do not provide an estimate of the actual numbers of infections resulting from each node. These actual infection numbers could be very important in many applications.
  2. While the existing inference measures do a good job, they do not actually directly consider the mechanics of disease spreading. As such, there is still room for improvement. As an example of potential improvement areas: none of the above-mentioned measures change their relative inference with the rate of infection β.
So, we sat down and thought hard about the local network properties that best relate to the mechanics of disease spreading. We focussed on the fact that disease spreads on a network in the manner of a walk. The disease can only reach node B from node A on a walk from A to B, where every node on that walk is also infected. Our basic premise is that the count of the number of walks from the initially infected node to other susceptible nodes (an approach known as walk counting) should be a local network structural property that gives good insight into disease spreading. The idea had been previously raised in the literature, but not properly examined.

We developed the idea further, working out the mathematics to turn these walk counts into estimates of infection numbers, as a function of infection rate β. Interestingly, different types of walks are involved for different disease spreading models; e.g. for SIR spreading one is only interested in self-avoiding walks (since no node can be infected twice), whereas for SIS spreading one is interested in any type of walk. Our estimates are not perfect, and we identify where and how known errors will occur. However, the estimates have several very important and useful properties compared to other approaches:
  1. Our technique provides estimates of actual numbers of infections from a given initially-infected node, which is more useful than inferred relative spreading efficiency alone. It's useful to know who the most influential spreaders are, but you also want to know the extent to which they will spread the disease.
  2. Importantly, testing with simulations on various social network structures reveals that our technique infers more accurate relative spreading efficiencies than those of the aforementioned previously published techniques over a wide range of infection rates β (up to the lower super-critical spreading regime). This is because our technique directly considers the mechanics of disease spreading.
  3. And our technique has excellent computational efficiency. Note that there is a trade-off in our technique between computational efficiency and higher accuracy: by considering only short infection walks, our algorithm runs faster, but better accuracy is generally obtained by considering longer walks. Our short-walk estimates can be made in approximately the same run-time as the aforementioned techniques, but with greater accuracy in inferring relative spreading efficiency. Our longer-walk estimates produce better accuracy again, and do so with orders of magnitude less runtime than simulations of the disease spreading process which obtain the same accuracy.
As such, we show that our walk counting approach provides the following unique combination of features: they are the most relevant local network structural feature to infer relative disease spreading efficiency, provide estimates of actual infection numbers, and are computationally efficient.

Comments/suggestions welcome, of course ...