Lies, damned lies, and rankings: the problem with Bloomberg’s COVID resilience ranking
This article is originally published at https://cartesianfaith.com
Every ranking creates winners and losers. In the case of Bloomberg’s Covid Resilience Ranking, the Philippines is a loser: dead last and called the worst place to be during the pandemic. A damning judgment that the country’s vaccine czar, Carlito Galvez, Jr., says isn’t fair due to a biased scoring methodology. Is Bloomberg’s ranking biased, or is this just a sore loser making excuses?
The illusion of objectivity
When faced with a complex world, scores and rankings offer an irresistible simplification to seemingly intractable questions. Bloomberg’s Covid Resilience Ranking attempts to distill how well a country is handling the pandemic with the “least amount of social and economic upheaval” into a single number. This score consolidates at least 12 different factors that span three broader categories: re-opening progress, COVID status, and quality of life. For the most part, Western nations fare higher with a few notable exceptions, like the U.A.E. Out of 53 countries included in the ranking, the Philippines is last with a score of 40.5.
On the surface this resilience score is objective since each factor is quantitative and some underlying mathematical formula is used to define the score. Unfortunately, the use of mathematics and algorithms don’t guarantee objectivity. Every scoring system hides bias within its nooks and crannies. Some common sources of bias include:
- the choice of data, which often manifests as selection bias,
- the choice of factors, which is part of the design process, and
- the choice of weights, which is also part of the design of the score.
In terms of data, what are included is just as important as what are omitted. Galvez argues that only 53 countries are included when there are 203 nations globally. This sort of selection bias may not necessarily affect the score, but it could result in the Philippines leaving behind the ignomy of being the worst country to be in for COVID.
The choice of factors in a score is inherently biased, since it is a design process. Someone must decide which factors to include and why. Even in predictive models, the choice of factors can be biased. Someone p-hacking results or prey to confirmation bias will choose specific features to get a result they want, leading to a biased model.
Galvez argues that the resilience ranking overly emphasizes economics and underweights health outcomes. If additional health factors were added, such as oxygen supply, the Philippines could have a higher score and potentially move out of last place. But crying foul of bias while attempting to assert your own seems disingenuous at best.
In addition to the choice of factors, the weights used for each factor are also biased. In scoring models, weights quantify our perception of importance. A weighted average or equally weighted score imply every factor has equal importance. Other scores may have different weights that prioritize some factors over others. For example, job candidates might be evaluated on a number of criteria, including GPA. This may not be super important and therefore be underweighted relative to other factors, such as years of experience.
A financial news provider like Bloomberg may arguably overweight economic factors over other factors. Whether or not that’s fair is in the eye of the beholder. Either way, we don’t really know, since Bloomberg only publishes the factors included and not their weights or methodology.
Using regression to reverse engineer a score
The central argument in Galvez’s rebuttal is that a different set of factors would be more fair and that the Philippines wouldn’t be last if that were the case. If we knew Bloomberg’s approach to their COVID resilience ranking, we could easily twst whether this argument is valid or not.
Despite an opaque methodology, we can reconstruct Bloomberg’s scoring formula with a linear regression. Many scores are simply a weighted sum (or weighted average) of a set of factors:
score = w1 x1 + w2 x2 + … + wn xn + C
Notice that this formulation is the same as a linear regression, so the fit should be extremely good and have small residuals. In R, this looks like
model <- lm(score ~ ., df)
Indeed, the model has an R2 of 99.99% with all p-values less than 2e-16.
The model gives us the weights for each of the 12 factors plus an intercept term, which scales the score between 1 and 100.
Coefficients: (Intercept) pct.vax lockdown.severity flight.capacity 3.484e+01 8.792e+00 -1.328e-01 8.837e+00 travel.routes.vax X1m.cases X3m.fatality total.deaths 3.201e-02 -4.240e-03 -1.083e+02 -1.362e-03 positivity.rate mobility gdp.forecast universal.health -3.704e+01 9.985e+00 5.916e+01 1.453e-01 hdi 1.966e+01
According to the model, the top weights in the model are community mobility, percent vaccinated (people covered by vaccines), and flight capacity. Coefficients for health status have negative values, meaning the greater the raw number, the lower the score, which is to be expected.
In a standard analysis we would try to reduce the number of features in the model based on weight of evidence or variable importance. In this case we are reverse engineering a score with a fixed set of known variables. So rather than identify the variables driving a response we want to know what the score looks like if we drop some features or change their weights given a baseline.
Complaining doesn’t make it better
Given our model for COVID resilience, let’s revisit Galvez’s claims. The first rebuttal was that only 53 out of 200 countries were included, so Philippines shouldn’t be considered worst. Fair point, but in terms of the resilience score, the Philippines is well below the median of 65.3 and the top scorers are at least 75% better. So arguing “we’re not last” is lipstick on a pig.
In terms of overemphasis on economic factors, let’s remember Bloomberg operates in the financial industry. It would be out of character if they didn’t emphasize financial factors. That said, we can remove GDP growth forecast as a factor and see whether it affects the ranking:
> score <- get_score(m1,df, 'gdp.forecast') > score[order(score, decreasing=TRUE)] U.A.E. Denmark Finland Spain Norway 73.28225 72.33159 72.28232 71.32348 70.92384 Switzerland Netherlands France Canada Sweden 69.36579 68.85232 67.79690 67.75746 67.61915 Germany Saudi Arabia Japan Ireland South Korea 67.58230 67.13115 66.98451 66.35466 66.30705 Austria Portugal Chile Czech Republic Belgium 65.55060 65.08572 65.04787 64.84809 63.26805 Turkey Italy U.S. Greece U.K. 62.30962 62.18414 61.96232 61.76198 61.45785 Colombia Mainland China Australia Israel New Zealand 60.79171 60.32333 59.86179 59.29019 58.25961 Poland Bangladesh Pakistan Iraq South Africa 58.00616 56.70804 56.33100 55.09957 54.76407 Nigeria Mexico Singapore Argentina Russia 54.69437 53.88603 53.62846 53.13737 52.21596 India Taiwan Peru Indonesia Malaysia 50.30832 50.16986 50.12612 48.33467 46.58546 Thailand Romania Vietnam Philippines Hong Kong 46.36560 45.35464 41.06285 37.92085 NA Brazil Iran Egypt NA NA NA
Nope, the Philippines is still last and still about 10% worse than second last Vietnam. Removing GDP did affect the ranking of the top nations, but didn’t have much effect on the ones at the bottom.
We can continue this game until we hit a set of features where the Philippines is not last. However, in a number of permutations I tried, the Philippines was still in the bottom 3. I’ll leave it as an exercise of the reader to exhaustively evaluate all combinations of factors.
The myth of COVID resilience
Despite the poor showing of the Philippines, Galvez has a point: scores and rankings are biased. Scores appear to be objective, but the reality is that the very act of defining the score introduces bias. This is how standardized tests like the SAT inadvertently penalize poor kids and how facial recognition systems can’t recognize black people.
In the case of COVID resiliency, COVID severity is highly localized. In the United States, resiliency depends on the state you live in. One single number just can’t do justice. The same is true in the Philippines, where the NCR (national capital region) may be in a totally different situation than the rest of the country.
Despite embedded bias, scores aren’t going away. What’s important is recognizing the embedded bias and regularly reviewing the choice of factors and weights to ensure the bias is aligned with your goals and minimizes unintended consequences.
Brian Lee Yung Rowe is CEO of Pez.AI, a chatbot company pioneering the AI Workplace.
Thanks for visiting r-craft.org
This article is originally published at https://cartesianfaith.com
Please visit source website for post related comments.