Notice anything fishy about the counts and percentages in column A of Table 2 in this article published in JAMA, the Journal of the American Medical Association? Despite 122 more people saying they are aware of websites that rate and review dentists (1,398) than websites that rate and review hospitals (1,276), the percentage presented for dentists (60%) is lower than the percentage presented for hospitals (61%). The key to this discrepancy is provided in footnote a to the table: "All percentages are weighted to approximate the US population and are calculated on a per-question basis excluding those who were eligible for each question but did not respond".
Let me use a theoretical example to illustrate why this is an issue. Say we asked the general public if they were aware (or not) of each of the Kardashian/Jenner siblings. For each sibling, the respondent is able to answer “Yes”, “No” or skip the question (as they were, presumably, for awareness of websites that rate and review physicians etc. in the JAMA article). Here is a table illustrating what a comparison of the percentages based on all with those based on all answering each question item could look like. I've assumed that the less familiar the sibling, the more likely the respondent is to skip the question instead of providing a negative response.
Q Are you aware of each of the following Kardashian/Jenner sibling?
As you can see, calculating percentages using a denominator (or base) that excludes non-responses (i.e. all answering) inflates all of the percentages with non-responses and, in particular, the percentages with large numbers of non-responses. Awareness of Kim Kardashian is 87% for all. This increases by five percentage points to 92% for all answering. Where greater numbers of respondents skipped the question item - as for awareness of Kim's half-brother Brandon - the discrepancy is even more marked: 20% of all are aware of him as compared to 31% of all answering.
Why the survey team allowed respondents to skip eligible question items in an online methodology is puzzling. This oversight in the survey design could and should have been addressed by recoding the missing cases as “Not answered” or, perhaps, "Don't know" and including them in the denominator for the percentage calculation.