Is the Delta variant really more than twice as transmissible as the original strain of the virus?

August 31, 2021

Summary

• The Delta variant, which was first detected in India in October and has recently been spreading very quickly in many regions of the world, is widely believed to be more than twice as transmissible as the original strain of the virus. This belief has generated widespread anxiety and led public health agencies in several countries to revise some of their recommendations.
• In this post, I start by explaining what people mean when they say that a variant is more transmissible than another, which leads me to make a distinction between a transmissibility advantage and a transmission advantage. While this distinction is rarely made explicitly, it is absolutely crucial to interpret the evidence correctly, as the rest of the post shows.
• I then present the evidence used to support the claim that Delta is more than twice as transmissible as the other variants and argue that, while it clearly shows that Delta had a substantial transmission advantage during its initial expansion in many places, this doesn’t show that it has a transmissibility advantage, let alone that estimates of its transmission advantage during its initial expansion accurately estimate any transmissibility advantage it might have.
• In fact, by looking at French data beyond Delta’s initial expansion, I show that, as it became the dominant strain in France, Delta’s transmission advantage collapsed rapidly. This is the exact same thing that already happened a few months ago with Alpha and something that is hard to square with the hypothesis that it’s more than twice as transmissible as the original strain of the virus. I also show that Delta’s transmission advantage varies wildly across regions, which suggests that other factors besides whatever transmissibility advantage it might have explain why it initially had such a large transmission advantage.
• I explain why, if they continue to do the same thing, epidemiologists will eventually conclude that Omega or whatever they call the next variant of concern has a basic reproduction number of 125, at which point one hopes they will recognize the unreliability of their methods. Unfortunately, while the literature on Delta’s transmissibility advantage is full of caveats that show they understand why the evidence must be interpreted carefully, most of them naively plug the estimates in that literature into the models they use to make projections, which has recently led to some spectacular failures.
• Finally, I propose a theory that can explain why Delta’s transmission advantage was initially very high before collapsing, just as Alpha’s before it. This theory crucially rests on the assumption that, unlike what most epidemiological models used during the pandemic assume, the population is highly structured. The effect of complex population structure on transmission has far-reaching implications beyond the debate about Delta’s transmissibility advantage, which I will explore in a forthcoming blog post where I will present modeling work I have done on this question.

The Delta variant of SARS-CoV-2, first detected in India last October, has recently been spreading rapidly in many countries and is now the dominant variant of the virus in most of them. According to the American Society for Microbiology, it’s more than twice as transmissible as the original strain of the virus, while the CDC claimed it was as transmissible as chickenpox. As a result, the agency recently published new guidelines on masking, recommending that even vaccinated people wear masks indoors in communities with high transmission of the virus. While there is compelling evidence that vaccines work fine against Delta, such as this study based on data from England, many are concerned that Delta’s high transmissibility means that vaccination will not be sufficient to contain the virus and that non-pharmaceutical interventions such as masking or even lockdowns and curfews will be necessary again. But I think the consensus on Delta’s transmissibility is deeply flawed and I will explain why in this post. I will start by explaining how epidemiologists have reached the conclusion that Delta is more than twice as transmissible as the original strain of the virus and why the inference they’re making could easily be misleading. In doing so, I will clarify some conceptual issues that I think are important to interpret the evidence correctly, but haven’t received enough attention in this debate.1I include myself in this criticism, for I didn’t spend enough time on those conceptual issues in my post about Alpha’s transmissibility advantage and, as a result, my analysis wasn’t as clear as it could and should have been. I will then present evidence showing that in fact Delta isn’t as transmissible as epidemiologists and public health officials claim. Finally, I will argue that, by not taking into account that evidence and assuming that Delta is more than twice as transmissible as the original strain in the models they used to make projections, epidemiologists are providing misleading guidance to decision-makers that might lead them to implement suboptimal policies.

Why people think Delta is more than twice as transmissible as the original strain

The main reason why people claim that Delta is more transmissible than other strains of the virus is that it’s growing faster than them. As we shall see, there are other lines of evidence suggesting that Delta might be more transmissible, but none of them would amount to anything if Delta weren’t growing faster than other variants. It’s clear that Delta is growing faster than other variants because, in most places, the proportion of cases that are caused by Delta has been rapidly increasing and in many of them the prevalence of Delta has reached almost 100%. This is very similar to what happened a few months ago with Alpha, the so-called British variant, which emerged in England at the end of 2020 and had become the dominant variant in most of the world a few months later. However, the mere fact that a variant is growing faster than the others doesn’t mean that people infected by that variant infect more people on average than people infected by another variant, let alone that they would still have done in a counterfactual scenario where the people who were actually infected by one variant were infected by the other, which as we have seen above is not the same thing.3Suppose again that Delta is spreading in networks where relatively few people have immunity against the virus. In that case, we’d expect people infected by Delta to infect more people on average than people infected by another variant, but we wouldn’t necessarily expect that, if the people who were actually infected by another variant had been infected by Delta and the people who were actually infected by Delta had been infected by another variant, this would still be the case. Indeed, the growth rate of a variant is determined not only by its reproduction number, but also by its generation interval. The generation interval is the time between the moment someone is infected and the moment they infect someone else. Keeping the reproduction number constant, the shorter that interval is, the faster the epidemic is growing. Indeed, if the generation interval is shorter, then even if people who have been infected will infect on average the same number of people, they will infect them faster and incidence will therefore grow faster. Thus, from the fact that a variant is growing faster than another, it doesn’t follow that it has a higher reproduction number. It could also be that it has a shorter generation interval. Even if we knew that Delta has the same generation interval has other variants, the fact that it’s growing faster would just mean that it has a higher reproduction number, which as I explained above doesn’t imply that it’s more transmissible.

In practice, as I explained a few months ago in my post on Alpha, we don’t know what the generation interval is for any variant, let alone that it’s the same for Delta as for the other variants, because it’s extremely difficult to estimate. This paper based on data from Singapore found that the serial interval, which is often used as a proxy for the generation interval, was the same for Delta as for the original strain of the virus, but it has serious limitations.4First, as is generally the case with studies trying to estimate the generation interval, it’s based on a very small sample. In particular, the study’s estimate of the generation interval for Delta is based on just 28 household transmission pairs, so it’s very imprecise. Moreover, the conclusion was reached by comparing data on household transmission pairs in April 2021 (when most people were infected by Delta) with data on household transmission pairs in April 2020 (where people were infected by the original strain of the virus), but a lot has changed in how people deal with the virus since then and this could have affected the generation interval. Furthermore, if the relationship between the serial interval, the generation time and what this paper calls the infectiousness profile is not the same for Delta as for the other variants (as many people seem to think), the extent to which the serial interval is a good proxy for the generation interval might not be the same for Delta as for the original strain, making the comparison misleading. Finally, even if Delta’s generation interval really is the same as the generation interval for the original strain of the virus in Singapore, it doesn’t mean that Delta’s generation interval is the same as the generation time for the other variants currently infecting people in the various countries where Delta is circulating. Moreover, several other papers found that Delta had a shorter generation interval than the previously established strains, so if they are correct then assuming that Delta has the same generation interval as the other variants will bias the estimates of its transmission advantage upward.5It’s not just the mean of the generation interval distribution that could differ between variants. For instance, if people infected by a variant remain infectious for a longer period of time, it might result in a longer-tailed generation interval distribution. For instance, this study based on data from China found that pairs of transmission associated with Delta had a generation interval of 2.9 days, which is much shorter than estimates based on data collected in China at the beginning of the pandemic, before Delta emerged. Another study based on South Korean data and with a large sample of transmission pairs found that the serial interval, which again is closely related to the generation interval, declined from ~4 days during the period when the prevalence of Delta was less than 50% to ~2.5 days in the subsequent period, suggesting that Delta has a longer generation interval than the previously established variants. However, those studies also have serious limitations (sometimes the same as the previously mentioned study from Singapore), so the truth is that we don’t really know how Delta’s generation interval compares to that of the previously established variants.

Looking at the data beyond the initial expansion of the lineage shows that Delta is not as transmissible as claimed

Santé publique France, the French public health agency, regularly publishes data on the prevalence of some mutations in the samples collected on people who tested positive for SARS-CoV-2. One of those mutations, L452R, is present in Delta but not in other variants that are currently circulating in France, so we can use those data to estimate the prevalence of Delta over time. As I explained above in my review of the literature on Delta’s transmissibility advantage, one approach to estimate that advantage is to look at the relationship between the reproduction number for the whole epidemic and the prevalence of that lineage. The idea is that, if Delta is more transmissible than the other strains, a greater prevalence of Delta somewhere should be associated with a greater reproduction number for the epidemic in that place. However, finding such a relationship would hardly be conclusive evidence that Delta is more transmissible and failing to find it would not be conclusive evidence that it’s not, because Delta could have a large transmissibility advantage but spread in contexts that slow transmission down relative to the contexts in which the other variants are spreading and it could also have no transmissibility advantage or even be less transmissible than the other variants but spread in contexts that facilitate transmission relative to the contexts in which the other variants are spreading. You can try to take into account that possibility by controlling for various factors that might confound the relationship between the prevalence of Delta and the epidemic’s reproduction number, but the problem is that we don’t really know what those factors are. Moreover, it’s likely that, even if we did, the data we’d need to control for them wouldn’t be available. Nevertheless, I will start by using this approach to estimate Delta’s transmissibility advantage in France,  before moving to a more direct approach in the rest of this section.

As a preliminary step, we can plot the epidemic’s reproduction number and the prevalence of Delta at the national level:As you can see, when you draw a line of best linear fit, there is a clear upward trend. However, the reproduction number has fallen dramatically recently as the prevalence of Delta was approaching 100%, so the relationship is clearly not linear. Vaccination could be part of the reason why the reproduction number has recently fallen, but it can’t be the whole or even the main reason, because the vaccination rate didn’t suddenly increase between week 29 and week 30. In any case, with only 11 observations, this type of analysis isn’t going to produce interesting results at the national level, so if we’re going to do this we should use data at a finer geographic scale to have more observations.

I will therefore perform the analysis at the department level, which increases the number of observations and makes it easier to detect any effect Delta may have on the reproduction number.12In order to avoid spurious results due to measurement error, which is a real concern at the department level, I’m excluding observations with less than 30 positive samples tested for the presence of L452R. Let’s start by plotting the epidemic’s reproduction number against the prevalence of Delta across French metropolitan departments during the past  few weeks:As you can see, it looks very similar to the plot at the national level, with an upward trend if you draw a line of best linear fit but a relationship that is clearly not linear. Moreover, there is a considerable amount of variation, even if you keep the prevalence of Delta constant.

Not only is the relationship between the prevalence of Delta and the epidemic’s reproduction number not linear, but there seems to be a clear period effect. In other words, part of that relationship appears to result from the fact that both the prevalence of Delta and the epidemic’s reproduction number have increased over time (until recently when the latter started to fall rapidly), while the relationship seems much weaker during any given period. This is pretty clear if you plot that relationship separately for each period:As you can see, when each period is considered separately, there is usually no discernable relationship between the prevalence of Delta and the epidemic’s reproduction number, sometimes a weak positive relationship and sometimes a weak negative relationship, despite the fact that for some periods there is a considerable amount of variation in the prevalence of Delta. Of course, other factors could affect the epidemic’s reproduction number beside the prevalence of Delta, but if this variant’s transmissibility is really as large as people claim, it’s still weird that we can’t see it on the figure above.

In order to estimate Delta’s transmissibility advantage by looking at the association between Delta’s prevalence and the epidemic’s reproduction number at the department level, I have fitted by MCMC different versions of the following model:

$R_{i,t} = (1 + \alpha{}p_{i,t})exp(log(R_i) + \Delta_t)$

where $p_{i,t}$ is the prevalence of Delta in department $i$ at time $t$, $latext R_i$ is a department-level intercept and $\Delta_t$ is a national time-varying component.13The code for this analysis and all the graphs in this post can be found in this repository on Github. The exponential function in that equation ensures that $R_{i,t}$ is always positive and implies that covariates have a multiplicative effect on it. $R_{i,t}$ is assumed to follow a Student t-distribution and observations with less than 30 positive samples tested for the presence of L452R were excluded from the data. The parameter of interest is $\alpha$, Delta’s transmissibility advantage, for which I used a normal prior with a mean of 0 and a standard deviation of 1. This model is very similar to some of the models used in Abbott and Funk (2021), a paper I mentioned above that used this approach to estimate Delta’s transmissibility advantage with English data. The main difference is that I used a normal prior for $\alpha$ to allow for the possibility that it might be negative, whereas they used a lognormal prior that effectively constrains it to be positive.14They also included Google mobility indicators as covariates inside the exponential, but I don’t think such a model really makes sense theoretically, so I did not.

Don’t worry if you don’t understand what this means, you just need to understand the gist of it and it’s actually pretty simple. In the basic model, we just look at the relationship between the prevalence of Delta and the epidemic’s reproduction number. Since the virus might spread more easily in some departments than others due to factors that have nothing to do with the prevalence of Delta and remain pretty stable, such as population density and the proportion of the population who live in overcrowded housing, we add a department-level intercept in another version of the model that should absorb the effect of those factors by allowing each department to have a different baseline for transmission that is modified by the prevalence of Delta. Since the epidemic’s reproduction number might change quasi-uniformly across departments for reasons that also have nothing to do with the prevalence of Delta, such as the rise in the prevalence of immunity due to vaccination (which is pretty uniform across departments although not completely so), we add a national time-varying component in another version of the model. It’s not obvious which version of the model is the best, so I’m trying a version in which the prevalence of Delta is the only independent variable, one in which there is also a department-level intercept, one in which there is a national time-varying component and finally one in which there is both a department-level intercept and a national time-varying component in addition to the prevalence of Delta.

The results of this analysis for each version of the model are summarized in this graph:The estimates of Delta’s transmissibility advantage range from 26% to 44% depending on the model, but are often very imprecisely estimated. This is pretty similar to what Abbott and Funk found for England, but lower than most of the estimates that are usually cited. This approach yields estimates of Delta’s transmissibility advantage, but whether you believe those estimates are reliable depends on how good you think the model used to derive each of them is and, in this case, we have no reason to trust them and many reasons to be very suspicious of them. I controlled for a few obvious things, but there are still countless ways in which the relationship between the prevalence of Delta and the epidemic’s reproduction number could be confounded, so I think it would be insane to assume those estimates accurately reflect Delta’s transmissibility advantage.15This should become clear in the last section, when I present my theory of what is really going on.

One way to see how careful we should be in interpreting those results is to perform the same analysis on each period separately.16For this analysis, I’m using a model that includes covariates for the complete vaccination rate and Google mobility indicators, but there is no time-varying component or department-level intercept. Here is a graph that summarizes the results of this analysis:As you can see, estimates are typically and unsurprisingly more imprecisely estimated than when all the data are used, and they’re also extremely unstable. In fact, during the period for which Delta’s transmissibility advantage is the most precisely estimated (week 29 to 30), the model finds that Delta is 38% less transmissible than the other variants. To be clear, I don’t actually believe that Delta is less transmissible, I just point that out to emphasize that estimates of a variant’s relative transmissibility obtained with this kind of method or indeed with any other method that people use to estimate it should be taken with a grain of salt the size of Jupiter. The truth is that we don’t have enough background knowledge about the data generating process to be confident that we can estimate relative transmissibility with that kind of approach.

We can also perform this analysis at the department level and this shows the same pattern:As you can see, although there is a lot of dispersion across departments, a downward trend is clearly visible.19I do not show observations where the transmission advantage is less than -50% or more than 300% on the graph, because otherwise that makes it hard to read, but they were used to compute the trend line and the vast majority of observations are still visible. The fact that estimates of Delta’s transmission advantage vary so much across departments is actually noteworthy. Some of that is no doubt measurement error, which is more of a concern at the department level because in some departments there are weeks during which few positive samples were tested for the presence of L452R, but most of that variation is probably real.20I have excluded from the data observations with less than 30 positive samples tested for the presence of L452R, which should limit the extent to which measurement error is creating spurious dispersion This suggests that other factors beside the intrinsic characteristics of the different variants play a huge role in explaining how fast they grow relative to each other, which in turn highlights the danger of interpreting estimates of Delta’s transmission advantage in a particular context as reflecting a transmissibility advantage. Unfortunately, public health experts around the world don’t seem to understand that, because it’s exactly what they’re doing. This also cast further doubt on the previous analysis based on looking at the relationship between the prevalence of Delta and the epidemic’s reproduction number, because if there are unknown factors that have a very large impact on transmission (clearly dominating whatever impact Delta might have), then we can be pretty sure that we won’t be able to estimate Delta’s transmissibility advantage accurately in that way even if we control for the handful of factors we do know about.

What should we make of all this?

The problem is that, not only are estimates of Delta’s transmission advantage during the initial expansion of the lineage highly variable depending on the context and the method used, but as we have seen in the case of France, it sometimes almost completely disappears. Moreover, as I explained previously, exactly the same thing happened with Alpha a few months ago. In other words, the epidemiologists who make projections for the French government currently assume that Delta is between 60% and 120% more transmissible than Alpha and that Alpha was 59% more transmissible than the original strain,22Not 60% but 59%, they are scientists after all, so precision is important. so that Delta comes out as 2.5 to 3.5 times more transmissible than the original strain, even though in the most recent data Alpha and Delta only had a transmission advantage over the previously established variants between 0% and 20% depending on the method used to estimate it. If we used the transmission advantage they had in the latest data to estimate their transmissibility advantage over the previously established variants, instead of using the transmission advantage measured during their initial expansion, we’d reach the conclusion that Delta is at most 44% more transmissible than the original strain and has a basic reproduction number somewhere between 2.5 and 3.6 if we assume that it was 2.5 for the original strain of the virus. Now, a basic reproduction number of 3.6 is pretty high, but the implications are completely different than if the basic reproduction number were between 6 and 9, as public health experts around the world assume, because the exponential nature of the epidemic process means that multiplying the basic reproduction number by 2 makes things much worse than twice as bad.

While I have focused on French data, this wide variability of Delta’s transmission advantage across contexts is not limited to France. Even between-country variation seems to be huge, with estimates ranging from ~25% to ~180% across countries in the study based on GISAID sequences I mentioned in my review of the literature, and those are pooled estimates over several weeks of data, so there would no doubt be even more variation if we estimated Delta’s transmission advantage in each country during each period separately. Indeed, if you look at English data, you can also see a lot of variation across regions and periods. For instance, here is a graph from the paper by Ferguson I mentioned above, which shows how Delta’s transmission advantage has changed over time in England:As you can see, although Delta’s transmission advantage is imprecisely estimated, it’s clear that it has varied a lot in England over time and across regions.

The estimates in this graph were obtained from full-sequencing data, but the paper has another graph that shows estimates of the transmission advantage for both Delta (B.1.617.2) and Alpha (B.1.1.7) based on S-gene target failure data,23Alpha has a deletion of 2 amino acids in the spike gene that Delta doesn’t have, which results in undetectable S-gene target in some PCR tests, so assuming that other variants don’t have this deletion and that most variants without it are Delta, we can use the proportion of positive test samples with S-gene target failure as a proxy for the prevalence of Alpha and Delta. This is less reliable than full-sequencing, but on the other hand it’s cheaper so more positive test samples can be tested, which means that we have more data on S-gene target failure than full-sequencing data. which also shows a lot of variability in both cases:Again, for both Alpha and Delta, we see a lot of variation over time and across regions. I have absolutely no doubt that we’d see the same thing everywhere if we looked.

In the discussion section of the paper, Ferguson notes this variability, but he quickly dismisses it with some ad hocreasoning of the sort that is customary in the literature about the COVID-19 pandemic:

Comparing B.1617.2 emergence with that of B.1.1.7 is informative. Firstly, incidence levels are still over 10-fold lower than they were in November 2020. Secondly, while R estimates for S-postive [sic] and S-negative mirrored each other during B.1.1.7’s emergence, a similar relationship is less clear during the recent emergence of B.1.617.2 (Figure 7). This is likely to be caused by a number of factors: (a) lower case incidence, giving greater uncertainty in estimating R, (b) the effects of imported B.1.617.2 cases biasing R estimates until the last two weeks, and (c) circulation of B.1.617.2 still being more focussed [sic] in different communities than that of B.1.1.7. We would also note that the transmission advantage B.1.1.7 exhibited over prior lineages following its emergence varied over time (Figure 8), likely as a result of the intensification of social distancing and reimposition of lockdown in January 2021. Thus while the evidence is now strong for B.1.617.2 having at a minimum of a 50% transmission advantage over B.1.1.7, it perhaps remains too early to derive more precise estimates.

Why he thinks that his speculation establishes that Delta’s transmissibility advantage over Alpha is no less than 50% is a mystery that may never be solved, but what is certain is that it does not and that his confidence in that conclusion is extremely misplaced.24In this paragraph, Ferguson talks about a “transmission advantage” and not a “transmissibility advantage”, but he uses both expressions interchangeably throughout the paper and makes the claim that Delta’s “transmissibility advantage” over Alpha is at least 50% elsewhere in the paper, including in the abstract. While he never explicitly makes the distinction between those 2 concepts, it’s clear from some of the caveats he makes that he at least has a vague understanding of it, though I also think he doesn’t fully appreciate the import of that distinction.

Unfortunately, he didn’t realize that, so he used this estimate to make projections about the epidemic in England and this led him to make this prediction:

Prof Ferguson, who sits on the Scientific Advisory Group for Emergencies (Sage), told the BBC’s Andrew Marr Show it was “almost certain” that the UK would reach 100,000 cases and 1,000 hospital admissions per day as almost all legal restrictions on social contact end in England and school holidays begin.

He said maintaining that level could be described as “success”.

“The real question is do we get to double that, or even higher?” he said, though adding that it was “much less certain” to predict.

As it turned out, the 7-day average of incidence peaked at only ~42,000 cases literally the day after he made that statement, which goes to show that he would have been well-inspired to pay more attention to the distinction between transmission and transmissibility.25To be clear, I’m not claiming his likely overestimate of Delta’s transmissibility advantage is the only reason why his projections were completely off (I don’t even think it was the main reason), but it clearly didn’t help. After falling for a while, incidence has been slowly increasing again in August, but it looks as though it has started to fall again, though it’s still unclear at this point. It doesn’t seem to have bothered him much though, since a few days later he said that he was “happy to be wrong in the right direction”. If that’s the case, he must live in a permanent state of perfect bliss.

In my post on Alpha’s transmissibility advantage, I mentioned Wes Pedgden’s theory that Alpha could have a different susceptibility profile than the other variants, so that upon being introduced it could initially spread more easily because a greater proportion of the population was susceptible to it than for the other variants at that point. While this theory would explain the data, I have since then come up with another theory that would also explain the data and doesn’t require assuming that variants have different susceptibility profiles. In order to understand it, however, I must say a few things about the role of population structure in transmission. The models that epidemiologists use to make projections or study the effects of non-pharmaceutical interventions assume that, at least in the same age group and in the same region, everyone is equally likely to infect everyone else if they are infected. However, in the real world, transmission takes place in a highly structured population. If you are infected, the probability that you are going to infect most people is effectively zero, because you’ll never have any contact with them. Of course, there are many people you will never have any contact with, but whom you could nevertheless indirectly infect by starting a chain of infections that go through them, but for most people the probability that it will happen is infinitesimal, whereas for people in your network it’s much higher. But what does this have to do with Delta’s transmissibility advantage, I hear you say? I’m about to tell you.

What I think is happening is that, from time to time, a new variant emerges that is somewhat more transmissible than the previously established strains, but not 50% more transmissible or anything like that. Most of the time, it doesn’t spread much beyond the network where it emerged, either because like most infected individuals the person in which the variant first appeared doesn’t infect anyone or because the network is not well-connected to other networks where the prevalence of immunity is low. But sometimes it does because the variant happened to emerge in a network in which a lot of people don’t have immunity and that is well-connected to other networks where a large proportion of the population is also susceptible or at least it somehow found its way to such a network even if the network in which it originally emerged didn’t have those characteristics. This results in a large outbreak and, because a lot of people are infected, a relatively large number of them will travel to other regions and other countries. Most of the time, they won’t start large outbreaks over there, because they’ll hit networks where the prevalence of immunity is high or that aren’t well-connected to other networks where the prevalence of immunity is low. But the larger the original outbreak and the larger the probability that one of the people that was part of that outbreak will start another large outbreak elsewhere. When it happens, this new outbreak will likely result in even more outbreaks elsewhere, for exactly the same reason.

People ask how a variant that is not more transmissible or at least not a lot more than the previously established strains can quickly take over everywhere and argue that it can’t be chance, because otherwise it wouldn’t happen everywhere, but according to my theory it’s not just chance or at least not in the sense they have in mind. To be sure, chance plays a large role at the beginning of the process. It’s just dumb luck that explains why a somewhat more transmissible variant emerges and manages to start a chain of infections in a network where the prevalence of immunity is low and that is well-connected to other networks where that is also the case. But once such a variant has caused a large outbreak somewhere, it’s no longer chance that explains why it creates large outbreaks elsewhere, it’s just the law of large numbers. Well, I guess that is also chance, but in a very different sense than at the beginning of the process. Large outbreaks beget other large outbreaks elsewhere just in virtue of the fact that they’re large, because that means that even if upon being introduced in a network the variant has the same probability to start a large outbreak as any other, it’s more likely that it will eventually be introduced in a network where the conditions to start another large outbreak are right just because more infected people means a greater probability that some of them will travel and infect such networks. Of course, a transmissibility advantage definitely helps and I think it’s likely that Delta is somewhat more transmissible than previously established strains, but once you get the ball rolling it’s a self-reinforcing process for purely mathematical reasons.27Another factor that might play a role here is that, if a variant is more transmissible, then depending on the underlying biological reason for that it might also have lower dispersion factor. The dispersion factor measures the extent to which a small proportion of the infected are responsible for a large share of infections. For instance, if the reason why a variant is more transmissible is that people infected by it shed more virus (as many people think in the case of Delta), then you would expect that variant to miss fewer opportunities to infect people due to insufficient viral inoculum. When the dispersion factor of a virus is small, as seems to be the case with SARS-CoV-2, not only do a few super-spreaders drive the majority of transmission, but most people who are infected don’t infect anyone. Thus, if a variant is not only more transmissible but also has a higher dispersion factor, then once it has caused a large outbreak and people who are part of that outbreak travel to other places, the probability that they will start a chain of infections over there and ultimately another large outbreak instead of being epidemic dead-ends is also higher. That being said, to my knowledge, the only study that looked at Delta’s dispersion factor, albeit indirectly, concluded that it was similar to that of the previously established variants.

Now think about what happens when a person infected by that variant in the original outbreak or in one of the outbreaks that were more or less directly caused by that original outbreak travels to a network where the prevalence of immunity is low and that is well-connected to other networks where that is also the case. Since the prevalence of immunity in that network and in the networks that are well-connected to it is relatively low, the variant is spreading more easily than previously established strains that have been circulating in other networks for longer and therefore have reduced the pool of susceptible individuals in those networks by infecting them. Thus, if you measure the variant’s transmission advantage over the previously established strains during its initial expansion, you will find that it’s large. However, as the variant rips through that network and reduces the pool of susceptible people in it, this advantage will fall. By the time this happens, however, the variant will have become the dominant strain in the population. This is going to happen even if the variant is not particularly more transmissible than the previously established strains, though it obviously helps if it’s somewhat more transmissible. Nor is it because the variant directly competes with the previously established strains, which is a very implausible mechanism for the pattern of falling transmission advantage we found in the data, because the proportion of the population that is infectious at any given time is very small and therefore so is the probability that the same people will be exposed to several variants. In a forthcoming blog post where I will present some modeling work I have done on how complex population structure can affect transmission, I will show with simulations how this phenomenon can happen.

Many people have argued that what happened with Alpha and Delta couldn’t simply be the result of the founder effect, so it’s important to explain how my theory differs from that explanation. In this context, a founder effect is what happens when a variant is introduced in a population and eventually becomes the dominant variant in that population through dumb luck, because it happens to start chains of infections to which most of the infections can ultimately be tracked back whereas the other variants fizzle out. Certainly, if a variant causes a large outbreak somewhere and becomes the dominant strain in that place, it would not be surprising if that outbreak led to other large outbreaks elsewhere even if this variant were no more transmissible than the other strains of the virus and the reason why this variant was successful is just dumb luck, because as I explained above large outbreaks beget more large outbreaks elsewhere for purely mathematical reasons. However, if the population is mixing homogeneously or something close enough, this variant will not become the dominant strain in other places unless incidence in those places is very low, because it’s overwhelmingly unlikely that a few introductions of that variant in a place where a much larger number of people are already infected by other variants will start chains of infections leading to more infections than the far more numerous chains of infections associated with other variants that already existed when those introductions took place. Thus, so the argument goes, this can’t explain why Alpha and Delta rapidly took over in so many countries, because at least in the case of Alpha, the emerging variant took over even in places where incidence was relatively high when it was introduced, such as France at the end of 2020.

However, this reasoning crucially hinges on the assumption that the population is mixing homogeneously or something close enough, for in a population with the kind of structure my theory proposes a new variant can absolutely become the dominant strain even if a lot of people are infected by other variants when it’s introduced, because it’s spreading in different networks than the previously established strains. While incidence may initially be high for the other variants, it’s going to fall before incidence for the newcomer, because the other variants are presumably circulating in networks where they have been spreading for a while and have therefore reduced the pool of susceptible individuals, whereas the newcomer is spreading in networks that had been relatively spared so far and will therefore spread faster even if it’s no more transmissible than the other variants. As I explained above, as long as the variant has caused a large outbreak somewhere else, it’s likely that it will eventually be introduced in such networks, just because it has numbers on its side. Of course, if incidence was already high when the variant was introduced in that place, the other variants also had numbers on their side. But if the population is structured in the way my theory assumes, namely in networks that are quasi-isolated from each other as far as the probability of infection is concerned (if you’re familiar with graph theory, you can think of them as similar to connected components), this won’t help them because they won’t be able to get out of the networks where they’re currently circulating and gain access to networks where they could spread more easily because the prevalence of immunity is lower. Again, I will explain this in more detail in a forthcoming blog post, where I will present modeling work I have done to study the effect this kind of population structure would have on transmission. I will explain why I think it’s not implausible and why it could explain several phenomena that everyone finds very puzzling since the beginning of the pandemic.

Conclusion

The goal of this essay is not to convince you that Delta isn’t more transmissible than the previously established strains. To be clear, I think it probably is (SARS-CoV-2 is subject to natural selection and it’s not particularly surprising that more transmissible variants should emerge over time), but I also think its transmissibility advantage has almost certainly been vastly overestimated. The view that Delta is more than twice as transmissible as the original strain is superficially plausible because it comes with a compelling narrative. People point out that not only has the expansion of Delta had been associated with large waves in countries that had already been severely affected by the pandemic, but that many countries that had so far been relatively spared, such as Australia and New Zealand but also Japan and several other East Asian countries, had recently experienced unusually large waves associated with Delta. However, while this makes for a powerful narrative that has led many people to uncritically accept the claim that Delta was so transmissible that it’s almost a completely different virus, it doesn’t really prove anything.

In the case of Australia and New Zealand, it’s not particularly surprising that, after more than one year and a half of declaring a lockdown as soon as a few cases are detected or, in the case of some Australian states, going through several lockdowns, cracks are starting to show that allow the virus to spread. Beside, for the moment, the waves that New Zealand and even Australia are experiencing are still pretty small compared to what Europe and the US have experienced since the beginning of the pandemic. As I will explain in a forthcoming post, once you take seriously how population structure can affect transmission, it’s also not particularly surprising that countries where the prevalence of immunity is very high nevertheless experienced large waves recently. It’s true that several East Asian countries that had so far been relatively spared have just gone or in some cases are still going through larger waves than in the past associated with Delta, but this is hardly good evidence that Delta is more than twice as transmissible than the original strain when you look at the data more closely.

Taiwan recently experienced its largest wave so far, but almost none of the cases were actually caused by Delta and the wave has been receding since the end of May, despite the importation of several Delta cases since then. Other East Asian countries that had so far been relatively spared have experienced larger than usual waves associated with Delta, but they remain very small compared to what Europe, the US and most regions in the world have experienced since the beginning of the pandemic and in most cases they have already receded. This is the case in Thailand, Laos, Cambodia and Myanmar, although in the latter it seems that incidence started to increase again. Vietnam is also experiencing a large wave that is still growing at the moment, which could bode ill for its neighbors if this goes on and people travel from Vietnam to those countries, but while it’s larger than any wave the country has experienced so far it remains much smaller than what we have seen in the rest of the world during the pandemic. The same thing is true in Japan and South Korea, where it seems that incidence just started to fall. Despite the expansion of Delta, it seems that whatever allowed these countries to be relatively spared by the pandemic is still working for the most part. Even if that weren’t the case, plenty of countries that were relatively spared at first later experienced very large waves that were not associated with any super-transmissible variant, such as East and Central European countries during the Fall of 2020. The waves associated with the expansion of Delta make for a compelling narrative, but they’re hardly evidence that it’s more than twice as transmissible as the original strain.

The evidence that has been adduced for this specific claim is no more convincing. As we have seen, the estimates of Delta’s transmissibility advantage that everybody uncritically cites could easily be misleading, and there are good reasons to think that Delta’s transmissibility advantage is actually much lower. If I’m right, this is hardly something to be shrugged off on the ground that even I say that Delta is probably more transmissible than the previously established strains, because as I have noted above in practice a virus with a basic reproduction number somewhere between 3 and 4 is a very different beast than one with a basic reproduction number between 6 and 9. Unfortunately, I’m not very hopeful that I will be able to convince many people because the official story about Delta makes for a compelling narrative and the post hoc ergo proper hoc fallacy is a powerful drug, but even if I have only managed to instill some doubt in you then I will consider my effort successful. In a way, what I’m saying shouldn’t be particularly controversial, it’s just a special case of a more general point about the need to be careful in interpreting the results of observational studies when the data generating process is not well understood. In that respect, it’s funny how many people correctly warn against taking the results of observational studies on treatments like hydroxychloroquine and ivermectin at face value, but show no caution whatsoever in accepting the results of studies on Delta’s transmissibility advantage, even though the potential for confounding is at least as great.

• 1
I include myself in this criticism, for I didn’t spend enough time on those conceptual issues in my post about Alpha’s transmissibility advantage and, as a result, my analysis wasn’t as clear as it could and should have been.
• 2
As far as I can tell, this distinction is never made explicitly in the literature, but as I note below the papers on Delta’s transmissibility advantage include caveats that show epidemiologists are at least vaguely aware of it.
• 3
Suppose again that Delta is spreading in networks where relatively few people have immunity against the virus. In that case, we’d expect people infected by Delta to infect more people on average than people infected by another variant, but we wouldn’t necessarily expect that, if the people who were actually infected by another variant had been infected by Delta and the people who were actually infected by Delta had been infected by another variant, this would still be the case.
• 4
First, as is generally the case with studies trying to estimate the generation interval, it’s based on a very small sample. In particular, the study’s estimate of the generation interval for Delta is based on just 28 household transmission pairs, so it’s very imprecise. Moreover, the conclusion was reached by comparing data on household transmission pairs in April 2021 (when most people were infected by Delta) with data on household transmission pairs in April 2020 (where people were infected by the original strain of the virus), but a lot has changed in how people deal with the virus since then and this could have affected the generation interval. Furthermore, if the relationship between the serial interval, the generation time and what this paper calls the infectiousness profile is not the same for Delta as for the other variants (as many people seem to think), the extent to which the serial interval is a good proxy for the generation interval might not be the same for Delta as for the original strain, making the comparison misleading. Finally, even if Delta’s generation interval really is the same as the generation interval for the original strain of the virus in Singapore, it doesn’t mean that Delta’s generation interval is the same as the generation time for the other variants currently infecting people in the various countries where Delta is circulating.
• 5
It’s not just the mean of the generation interval distribution that could differ between variants. For instance, if people infected by a variant remain infectious for a longer period of time, it might result in a longer-tailed generation interval distribution.
• 6
As I noted previously, the particular estimate of the transmissibility advantage they obtain is pretty sensitive to the assumption they make about the generation interval distribution, but as long as Delta is growing faster than the other variants it’s always positive.
• 7
Those are the means of the posterior draws of the transmissibility advantage, which are not given in the paper, but can be recovered from the code uploaded by the authors on GitHub. The means of the posterior draws of the transmissibility advantage range from 14% to 68% when the model is fitted on each week separately. I’m using the latest update of the code, which is why the results I give are different from those in the original version of the report.
• 8
In my survey of the literature, I only reported point estimates, but if you look at confidence intervals Delta’s transmissibility advantage could be anywhere between -10% and 117%. As I just noted, this actually understates the real uncertainty, because with different but equally reasonable assumptions and methods different estimates would have been obtained.
• 9
• 10
Of course, it stands to reason that if there is more viral RNA in nasopharyngeal swabs from people infected by Delta, these people shed more virus, but the relationship between how much viral RNA there is in nasopharyngeal swabs and how much virus people shed may not be linear. It could also be that, while they do shed more virus around the time of symptoms onset and before that (because the virus is replicating faster before the immune response kicks in), this isn’t true for the rest of the infectious period, so over the whole period the difference is less dramatic than the amount of viral RNA found in nasopharyngeal swabs around the time of symptoms onset would suggest.
• 11
The group who published this paper has a key role in advising the government about the pandemic and their projections are used to inform policy.
• 12
In order to avoid spurious results due to measurement error, which is a real concern at the department level, I’m excluding observations with less than 30 positive samples tested for the presence of L452R.
• 13
The code for this analysis and all the graphs in this post can be found in this repository on Github.
• 14
They also included Google mobility indicators as covariates inside the exponential, but I don’t think such a model really makes sense theoretically, so I did not.
• 15
This should become clear in the last section, when I present my theory of what is really going on.
• 16
For this analysis, I’m using a model that includes covariates for the complete vaccination rate and Google mobility indicators, but there is no time-varying component or department-level intercept.
• 17
See my post on Alpha’s transmissibility advantage for where that estimate comes from. As I explained in that post, there is much uncertainty about the actual generation interval, but the finding that Delta’s transmission advantage has fallen a lot in France since the lineage’s initial expansion is true no matter what assumption we make about the generation interval distribution.
• 18
Unlike the French study I mentioned above, I didn’t bother computing confidence intervals because, since to my knowledge the positive samples tested for the presence of L452R are not randomly selected, those confidence intervals would have been meaningless although people like to compute them even in that case because they look “scientific”. Similarly, while I’m showing a trend line to stress the point I’m making, it shouldn’t be taken very seriously since there is no reason to expect a linear relationship between Delta’s transmission advantage and its prevalence.
• 19
I do not show observations where the transmission advantage is less than -50% or more than 300% on the graph, because otherwise that makes it hard to read, but they were used to compute the trend line and the vast majority of observations are still visible.
• 20
I have excluded from the data observations with less than 30 positive samples tested for the presence of L452R, which should limit the extent to which measurement error is creating spurious dispersion
• 21
As far as I can tell, this range comes from a document prepared by the Scientific Pandemic Influenza Group on Modelling, Operational sub-group for the British government on June 2, which after summarizing just 3 of the studies that I reviewed above noted: “These estimates of the growth advantage of delta compared to alpha range from approximately 25% to 100%, but they appear to be clustering around 40% to 60%. Higher estimates, however, still cannot be ruled out.” While they use the expression “growth advantage” instead of “transmissibility advantage”, this distinction is never made anywhere in the document and, if the authors deliberately used that expression rather than “transmissibility advantage” to convey that one couldn’t assume Delta’s growth advantage would be constant across contexts, this was lost on everyone I know who cited that document.
• 22
Not 60% but 59%, they are scientists after all, so precision is important.
• 23
Alpha has a deletion of 2 amino acids in the spike gene that Delta doesn’t have, which results in undetectable S-gene target in some PCR tests, so assuming that other variants don’t have this deletion and that most variants without it are Delta, we can use the proportion of positive test samples with S-gene target failure as a proxy for the prevalence of Alpha and Delta. This is less reliable than full-sequencing, but on the other hand it’s cheaper so more positive test samples can be tested, which means that we have more data on S-gene target failure than full-sequencing data.
• 24
In this paragraph, Ferguson talks about a “transmission advantage” and not a “transmissibility advantage”, but he uses both expressions interchangeably throughout the paper and makes the claim that Delta’s “transmissibility advantage” over Alpha is at least 50% elsewhere in the paper, including in the abstract. While he never explicitly makes the distinction between those 2 concepts, it’s clear from some of the caveats he makes that he at least has a vague understanding of it, though I also think he doesn’t fully appreciate the import of that distinction.
• 25
To be clear, I’m not claiming his likely overestimate of Delta’s transmissibility advantage is the only reason why his projections were completely off (I don’t even think it was the main reason), but it clearly didn’t help. After falling for a while, incidence has been slowly increasing again in August, but it looks as though it has started to fall again, though it’s still unclear at this point.
• 26
The fall in Delta’s transmission advantage could also be a compositional effect due to the fact that non-Delta variants in France are not the same now as they were a few weeks ago, but neither the data on positive samples tested for the presence of mutations nor the full-sequencing data on GISAID show any change in the composition of non-Delta variants in France during the past few weeks that could have this effect, so that’s not it. The only noticeable change in the composition of non-Delta variant during this period seems to be that Gamma increased in prevalence relative to Alpha, but this change took place before Delta’s transmissibility advantage collapsed and, since Gamma is not considered significantly more transmissible than Alpha, it couldn’t have explained it anyway.
• 27
Another factor that might play a role here is that, if a variant is more transmissible, then depending on the underlying biological reason for that it might also have lower dispersion factor. The dispersion factor measures the extent to which a small proportion of the infected are responsible for a large share of infections. For instance, if the reason why a variant is more transmissible is that people infected by it shed more virus (as many people think in the case of Delta), then you would expect that variant to miss fewer opportunities to infect people due to insufficient viral inoculum. When the dispersion factor of a virus is small, as seems to be the case with SARS-CoV-2, not only do a few super-spreaders drive the majority of transmission, but most people who are infected don’t infect anyone. Thus, if a variant is not only more transmissible but also has a higher dispersion factor, then once it has caused a large outbreak and people who are part of that outbreak travel to other places, the probability that they will start a chain of infections over there and ultimately another large outbreak instead of being epidemic dead-ends is also higher. That being said, to my knowledge, the only study that looked at Delta’s dispersion factor, albeit indirectly, concluded that it was similar to that of the previously established variants.
• 28
By contrast, Alpha was first identified in England in September 2020, but its share of the sequences from the UK uploaded on GISAID started increasing rapidly in November, so it took about 2 months against 5 months for Delta. Moreover, genomic surveillance is much better in England than in India, so if Delta was only identified in October 2020 it means that it almost certainly emerged much earlier in India and therefore this comparison actually understates how much longer it took for Delta to start taking over.
• 29
In fact, in a deterministic epidemiological model with homogeneous population mixing, this is literally impossible. Even if you model the stochastic character of transmission, while not strictly speaking impossible, the probability is astronomically low in any reasonable model.