Ever since the coronavirus reached the U.S., officials and citizens alike have gauged the severity of the spread by tracking one measure in particular: How many new cases are confirmed through testing each day. However, it has been clear all along that this number is an understatement because of testing shortfalls.
Now a research team at Columbia University has built a mathematical model that gives a much more complete — and scary — picture of how much virus is circulating in our communities.
It estimates how many people are never counted because they never get tested. And it answers a second question that is arguably even more crucial — but that until now has not been reliably estimated: On any given day, what is the total number of people who are actively infectious? This includes those who may have been infected on previous days but are still shedding virus and capable of spreading disease.
The model's conclusion: On any given day, the actual number of active cases — people who are newly infected or still infectious — is likely 10 times that day's official number of reported cases.
The model has not been published or peer-reviewed yet, but lead researcher, Jeffrey Shaman, an infectious disease specialist at Columbia University, shared the data exclusively with NPR. Here are more of the startling takeaways.
Missed cases remain a massive problem
To come up with their bottom line estimate, the researchers' first step was to estimate, for each day of the outbreak so far, how many people actually became infectious. Then they compared that with the number who got tested and counted as a confirmed case.
This discrepancy alone was huge: Shaman estimates that over the entirety of the pandemic, five times more people have been infected than were reported.
"The numbers amplify greatly," says Shaman. "When we look at confirmed cases, we're really only seeing the tip of the iceberg."
The rate of testing in the U.S. has improved over time. Shaman's model finds that at the very start of the pandemic, only 1 in 10 cases were being reported. By early May, it had risen to 1 in 6. By September, it was up to 1 in 5.
Shaman estimates that, on average over the past three months, the official tally has been counting only 1 in 4 infections. In other words, says Shaman, to get a rough sense of the actual number of new cases per day, you should multiply the daily reported number by four.
It gets worse — once you consider current active infections
Even estimating the true number of daily new infections fails to provide the full picture of how risky it may be to mingle with people in your community right now.
Shaman's estimated figures for how many people became infectious each day only tell you who is a new case. But people stay contagious for "three or four days on average," says Shaman.
So to fully appreciate the threat level on any given day, you would also want to count the people whose infection started earlier and who are still shedding the virus.
"There are a lot of people walking around with this virus who never know they have it," says Shaman. "Even the people who are ultimately swabbed and confirmed, they were contagious before they even had their symptoms."
So this is the next step to Shaman's model: He estimates that the number of people actively shedding virus on any given day is about 10 times the number of daily new reported cases.
How many people does this add up to? Well, on the worst day for reported new cases so far — Jan. 2 — 91 out of every 100,000 people in the U.S. tested positive. But Shaman estimates that on that day, 998 per 100,000 people were actually actively shedding the virus.
The peak was even worse in many jurisdictions. In Los Angeles County, says Shaman, at the height of the winter surge, 3% of the county's population was contagious, or roughly 3,000 per 100,000.
Transmission has slowed down considerably since then across the United States. But it is still well above the summer highs. And Shaman estimates that as of last Saturday, 1.25 million people nationwide were actively shedding virus.
"That's a very, very high level," says Shaman. "That still means there are a lot of people out there who are actively infected, who are passing it on, and who could expose people at risk."
Why this means we can't rush to open up
The findings lend urgency to the rush to vaccinate Americans, says Shaman. And it suggests that Americans will need to keep up a high degree of physical distancing and masking until many more people are vaccinated.
"If we let up now, given how much infection is out there, we're going to make it so that many, many more people are going to get the virus before they ever have the chance to get the vaccine," says Shaman.
Ashish Jha, a public health researcher and dean of Brown University's School of Public Health, says he considers these new models "really important" although he cannot assess the model's methodology since it's not public.
"What people really care about is not: 'How many people in my town or my state became a case yesterday?' " says Jha. "It's: 'When I'm out and about, how many people around me are infectious? How many people around me are potentially spreading the virus?' This is the first [work] I have seen that really tries to get at that."
One-third of the U.S. population has already been infected.
The sustained periods of high transmission in the U.S. also mean that by now, quite a large share of the U.S. population has been infected beyond what the tallies of reported cases would indicate. Nationwide, Shaman estimates that about 120 million people have now been infected, just over a third of the U.S. population.
The model also provides estimates for each state.
There's a fair amount of variation: In North Dakota and New York, for instance, Shaman estimates about half of the population has now been infected. "They may even be approaching herd immunity there," he says.
But Shaman also cautions that it's possible the immunity gained through infection — especially from mild or asymptomatic cases — might wane before enough people get vaccinated to tamp down outbreaks. It's also not known what degree of protection prior infection will confer against some of the new variants that were recently detected in the United Kingdom, South Africa and Brazil — and which many scientists assume will become increasingly common in the U.S.
Also, in many states, the share of infected people is much lower. And the U.S. overall — with the estimated one-third infected — is nowhere near the 70 to 85% level that scientists estimate must be immune before the pandemic may begin to wind down here.
Shaman's conclusion: "I don't think we should psychologically be thinking about any sort of move into a post-pandemic phase and a real reopening until the summer."
"The important thing," he adds, "is not to get overly exuberant right now and think that we're done with this thing."
How this model compares with previous estimates
Shaman is not the first to attempt to estimate how many infections are missed by testing. This part of his analysis — though not the modeling of total active infections — echoes earlier research.
Shaman found that early in the pandemic testing was only capturing one out of 10 actual new infections — this is in line with estimates by researchers with the U.S. Centers for Disease Control and Prevention. These studies include several that extrapolated from blood sampling that looked for antibodies for the coronavirus — which is evidence of prior infection. They suggested that the numbers of actual infections were 10 times higher than reported.
In another study by researchers with the CDC, a model that was similar but more rudimentary than the version Shaman's team used had found actual infections were likely eight times higher than reported during the first seven months of the pandemic.
So how did Shaman's team come up with their estimates? They started with two pieces of known information: First was the number of people who have tested positive each day since the start of the pandemic. The second was a set of anonymized cellphone location data — provided by the company SafeGraph — that told them, for each day, how much people were intermingling by moving outside of their homes, including, says Shaman, "to places of interest like grocery stores and restaurants."
The team then fed this data into a computer program that essentially tried to find the best possible answer to the variables whose value the team did not know — things like, how many cases were being missed each day? And how long were people remaining infectious?
The program effectively ran multiple simulations to see, for each day of the pandemic, which combination of answers allowed it to correctly predict how many reported cases were produced in the days afterward. In a nutshell, says Shaman, "it searches for the optimal solution that best fits the observable data."
NPR's Sydney Lupkin contributed to this report.