WAGs and Convenience Samples

Published April 15, 2020

To my knowledge, we have yet to have meaningful samples of the general population with regard to the prevalence of coronavirus, the distribution of its outcomes, and the build-up of immunity. What we have are “convenience samples.” In general, convenience samples are samples that are easy to obtain. In the present case, these are samples of people who were tested for the disease because they showed symptoms, or because they were exposed or thought they were exposed, or because they were nervous. These groups cannot be presumed representative of the general population and, so, an inference from the results of their tests to the general population is something between a WAG (wild-ass guess) and the exercise of professional judgment.

Many of the surveys we have nowadays are based on convenience samples. Consider public opinion polls. Response rates to telephone-based surveys have been falling precipitously for reasons such as caller ID and rising rates of explicit refusal. If easy to contact people are representative of the general population with respect to questions of interest, a low response rate doesn’t introduce harmful bias into the sample. But, how would you know this?

With regard to demographic groups whose response rates are lower than average, survey-takers can use “poststratification weighting” to make the sample better mimic the general population. While not a guarantee (because of the possibility of “within-cell bias”), this technique has shown itself to be useful.

Armed with poststratification weighting, some poll-takers have taken to the internet to gather convenience samples. Large and very unrepresentative samples are collected, and then poststratified, like so many fruits put into a blender, with the uniform paste that comes out supposedly reflective of the general population of fruit. The surprise election of Donald Trump in 2016, and the surprise vote to leave the European Union, in the United Kingdom earlier that same year testify to the limitations of polling.

Which brings us to the issue at hand: what to make of the convenience sample of coronavirus test-takers? Through yesterday, 3.1 million Americans have been tested, of which 600,000 proved positive, and 26,000 have died either positive or deemed without a test as positive. But, we know that many people with mild symptoms and with no symptoms aren’t tested. Sorry to be morbid, but the number 26,000 is pretty reliable. On the other hand, the number 600,000 is not reliable. It is an undercount of the prevalence of the disease in the general population.

According to the aggregate results during the past two weeks (a total sample size of 3,000) of a tracking poll being conducted by two Democrat polling organizations, 3.5 percent registered voters report that they were symptomatic and were tested, of whom 42 percent proved positive (equal to 1.5 percent of the reference population). Again according to the survey, 5 people had symptoms but weren’t tested.  Assuming if they were tested, they would have the same positive rate, implies that about 3.5 percent of the American have the disease and are symptomatic.

Link to poll:


Now we come to an unknown. How many people have the disease and are asymptomatic. At present, we only know a relatively small number of people who tested positive at one time and now test negative. These people are precious because their blood probably contains antibodies, and plasma abstracted from their blood can be used as a treatment. Perhaps the number is as many people who have symptoms; or, another 3.5 percent. Or, perhaps the number is smaller or larger. We just don’t know.

If the ratio of symptomatic to asymptomatic is 1-to-1, then 7 percent of the American people have the disease and, furthermore, 3.5 percent of the American people had the disease two weeks ago and are now recovered from a disease they didn’t know they had. This implies that 10.5 percent of the American people have or had the disease; or, 34 million people. If this 1-to-1 ratio is correct, then the implied mortality rate from coronavirus is not much different from that of the ordinary flu.

Given the virulence of the disease, the panic, and horror stories such as sick people trapped on cruise ships, it seems unbelievable that the mortality rate of coronavirus could be so low. But, there are so many assumptions, that the inference can easily be dismissed as a WAG. Plus, how reliable are public opinion polls? This poll seems to overstate the number of Americans who have been tested, and the percent of people tested who are positive. We really need scientifically-valid surveys, to include serology tests, to determine prevalence, outcomes and immunity. Several clinics are undertaking such surveys at this very time.

There are some other results from the poll discussed above that might be informative. It appears Republicans have a partial immunity to coronavirus when compared to Democrats. Only 7 percent of Republicans report symptoms, while 11 percent of Democrats do. And, more Democrats are afraid, angry and upset because of the epidemic, while more Republicans are hopeful.