AT THE PEAK of the coronavirus epidemic in America, hospitals needed to triage patients. Only the sickest were admitted. Others were sent home to self-monitor. One measure used to determine the severity of an individual’s illness was his blood-oxygen level. The devices typically employed to do this, known as pulse oximeters, are easy to use. They clip onto a fingertip like a clothes peg. Regrettably, they record some darker-skinned patients as being healthier than they really are. This may have resulted in people who needed hospital treatment being denied it.
Work published last year in the New England Journal of Medicine, which looked at more than 10,000 patients throughout America, suggested the pulse oximeters used overestimated blood-oxygen saturation more frequently in black people than white. A healthy human being has an oxygen saturation of 92-96%. In this work some patients who registered that level according to pulse oximetry had a true saturation (as recorded by the arterial blood-gas measure, a method which requires the actual drawing of blood) of less than 88%. For black participants this happened 12% of the time—three times the rate at which it occurred for white participants. As Michael Sjoding of the University of Michigan, the study’s leader, observes, this difference would also be the difference between being admitted to the hospital and being sent home.
Dr Sjoding’s investigations are not the only evidence of such bias. Work suggesting problems with pulse oximeters goes back as far as 1999. Despite that, and pulse oximetry’s widespread use, few practitioners were aware until recently of the problems involved. “In my entire pulmonary critical-care training, I was never taught that the device could potentially be less accurate,” explains Dr Sjoding.
On February 19th, after some media attention and a letter to the Food and Drug Administration (FDA) from three senators, that agency released a warning that “pulse oximeters have limitations and a risk of inaccuracy under certain circumstances that should be considered.” It counselled doctors to use pulse oximeter readings as an estimate, and to make decisions based on trends rather than absolute thresholds.
Non-white people have been disproportionately affected by covid-19 in many places. In America, according to the country’s national public-health agency, the Centres for Disease Control and Prevention, black and Hispanic individuals are twice as likely to die from it as white people. There are many reasons for this disparity, and a single type of medical device certainly cannot be blamed for most of it. But the wider point is that medical technology should be designed from the outset to be free from such bias—and, unfortunately, it isn’t. Generally speaking, it is designed by white men and tested on white men. That it works best on white men is therefore hardly a surprise. But this fact has potentially lethal consequences for the part of the world’s population who are not white men, ie, the vast majority.
Pulse oximeters, which were invented in the 1970s and adapted for commercial use in the 1980s, are a classic case in point. They work by passing two beams of light, one red and one infrared, through the tissue of the finger they are clipped to and then calculating the amount of each that is absorbed. Oxygenated and deoxygenated haemoglobin both absorb these frequencies differently, meaning that the oxygen saturation in someone’s blood can be determined by comparing the strengths of the two beams after they have passed through that patient’s finger.
This process must, however, be calibrated—for other tissues, skin included, also absorb some of the beams. Dark skin will clearly absorb more of the incident light than white skin, thus weakening the signal, and may well absorb one beam more than the other. So unless this calibration is done on both, which is not always the case, the result may be biased.
Despite its error-prone history, pulse oximetry is a mainstay in hospitals. The arterial-blood gas measure, being both invasive and painful, is reserved for the sickest. Even before covid-19, doctors routinely used pulse oximeters to decide who to admit to hospital, and to monitor patients’ health and make decisions about their treatment. The devices are also employed as sensors for the regulation of automated treatments such as oxygen administration.
Now oximeters are cropping up in people’s houses, too, as those diagnosed with covid-19 but not ill enough to need hospital admission are advised to stay home and monitor their condition themselves. The devices have become so popular that the American Lung Association has asked healthy people not to buy them, in order to avoid creating supply shortages for hospitals and those who are actually sick.
If the example of the pulse oximeter were a one-off, that would be bad enough. But it isn’t. Another scandal was a medical algorithm used on more than 100m Americans a year to allocate scarce resources to those with the greatest need. A study published in 2019 showed that this software gave white patients priority over black ones because it used people’s previous medical spending as a proxy for their current medical need. Since black patients often spend less on medical care for non-clinical reasons, including lack of access and racial bias in treatment, they frequently have fewer previous expenses than white patients with similar medical requirements, and were discriminated against accordingly.
In this case, the firm that created the algorithm quickly took the point. It has collaborated with the researchers involved to change its operation. That has resulted in a huge reduction in bias, though it may not have eliminated it completely. Algorithmic errors can be tricky to eradicate. Rooting out bias requires extreme attention to detail at every stage of development.
Nor is ethnicity the only source of bias. Women are also often at a disadvantage when it comes to treatment. Procedures such as hip implants and heart surgery, for example, are more likely to fail in them than in men.
A study published in 2013, in the Journal of the American Medical Association, found that women in the four American regions which the authors looked at had a 29% higher risk of their hip implants failing within three years of hip-replacement operations. Another study, from 2019, found that women were twice as likely to experience complications from implantable cardiac devices, such as pacemakers, within 90 days of implantation. In both cases, the failures of device-makers to recognise physical differences, particularly in size, between male and female bodies were to blame. As Isuru Ranasinghe, a cardiologist at the University of Queensland, in Australia, who was part of the cardiac-implant study, observes, “In almost every cardiac procedure I’m aware of, women have a higher risk of complications.”
One way to ameliorate problems of this sort is to insist that devices and procedures are tested on a wider range of individuals than just white males. In theory, at least in America, this is already supposed to happen. As long ago as 1993, Congress directed the country’s National Institutes of Health (NIH) to require the inclusion of women and non-white people in clinical trials. FDA guidance also encourages studies to have “adequate numbers” of participants to conduct analyses by sex or by race.
But decades after the directive to the NIH, non-white people and women are still underrepresented. An analysis carried out in 2019 found that women were less than 30% of the study population in 15% of NIH studies conducted in 2015, and black people (who make up 13% of America’s population) comprised 10% or less of the participants in about one-fifth of studies. Furthermore, only 26% of studies conducted subgroup analyses by sex, and only 13% by race or ethnicity.
According to Bakul Patel, director of the Digital Health Centre of Excellence at the FDA, the agency is learning from past errors. It has issued guidelines encouraging research on diverse patient populations. And Mr Patel asks all those with a stake in the matter, from regulators to patients, to work to ensure proper representation in medical research. “This is not going to be solved by one organisation or one entity or one stakeholder group,” he explains. “It needs to be a collective effort.”
Some have already acted. Designers at Nonin Medical, a pulse-oximeter company in Minnesota, long ago took steps to eliminate racial bias from the firm’s devices. According to David Hemink, Nonin’s boss, the company’s clinical studies go beyond the FDA recommendation to include “at least two darkly pigmented subjects or 15% of [the] subject pool, whichever is larger”, and include more than twice the recommended number of dark-skinned participants. Independent assessments conducted in 2005 and 2007 suggest this approach has worked. They found Nonin’s products to have a clinically insignificant bias at the lowest saturation levels (less than 80%).
Changes in procedure, as well as design, can help, too. For example, Linh Ngo, a member of Dr Ranasinghe’s team in Queensland, recommends using real-time ultrasonic scanning during heart operations. This, she reckons, will both reduce risk of vascular injury in all patients and equalise outcomes between the sexes.
In America, meanwhile, the FDA recommends transparency for medical algorithms, so that users, including insurance companies, doctors and patients themselves, can understand how they operate. And others are working to expand the range of options available.
Ana Claudia Arias, a professor of engineering at the University of California, Berkeley, whose laboratory specialises in improving medical devices, encourages the production of different models to accommodate different body types. For pulse oximeters in particular, size matters as well as skin tone. If the device is too large, light from the room can interfere with the reading—and women tend to have smaller fingers. Dr Arias therefore recommends 15 types to account for five skin tones and three sizes (small, medium, large). Such strategies to accommodate human variation would be welcome for other medical devices. Whether medical-technology manufacturers will meet the demand remains to be seen. ■
A version of this article was published online on April 7th 2021
This article appeared in the Science & technology section of the print edition under the headline “Fatal truths”
More than 400 Grail patients incorrectly told they may have cancer
Pandemic lessons: More health workers, less faxing—an Ars Frontiers recap
The real culprit behind the 1871 vandalism of the Paleozoic Museum in Central Park