In the first published guidelines for responsible machine learning in health care, experts from around the world – including faculty at the University of Toronto and the Vector Institute for Artificial Intelligence – are calling for an interdisciplinary approach.
The paper, titled “Do no harm: a roadmap for responsible machine learning for health care,” was published in Nature Medicine this week. It recommends that deployment of machine learning in health care involve interdisciplinary teams, including clinical experts and machine-learning researchers.
Decision-makers like hospital administrators and regulatory agencies should also be involved, as well as users of machine learning, including nurses, physicians, patients and the friends and family of patients.
“The majority of [machine learning] solutions are currently being developed in silos, away from the real-world clinical problems and settings that these models will actually impact," says senior author Anna Goldenberg, who is an assistant professor of computer science at U of T, the associate research director of health at the Vector Institute and a senior scientist in genetics and genome biology at the Hospital for Sick Children.
“Our guidelines provide a framework within which many issues stemming from the complexity of adopting [machine learning] in health care in particular can be avoided.”
The paper warns that, “health care is not immune to pernicious bias,” adding that “the health data on which algorithms are trained are likely to be influenced by many facets of social inequality, including bias toward those who contribute the most data.”
Co-author Marzyeh Ghassemi, an assistant professor in the departments of computer science and medicine, said that machine learning models can sometimes be presented as if “a model on its own is a solution.”
By contrast, she points out that most problems in human health can’t be solved with a model.
“You have to engage in the fact that health care is a process – it’s not a static data set that you can pull once, train a model on and deploy,” says Ghassemi, who is also a faculty member at the Vector Institute and holds a CIFAR AI Chair.
“It’s an ongoing process where labels and definitions of clinical conditions can and do change. Populations can shift, treatments and different locations for different groups can vary. I think there is a lot of careful thought that needs to go into deployable solutions, which is very separate from creating an interesting machine learning model.”
A machine learning model can, however, be promising from a technical perspective for an ultimately successful solution. But Ghassemi says there is a wider set of objectives that needs to be achieved.
“We tried to focus on things you might not think about initially: choosing the right problems, making sure the solution is useful, really rigorously thinking through the ethical implications of deployment and evaluation,” Ghassemi says.
“Evaluation is particularly challenging because you have to thoughtfully report your results, and then think through the caveats for responsible deployments.”
Ghassemi says it’s important to think through the ethical impacts of machine learning since each developer’s approach will vary, depending on their background.
“In those with a really strong technical background, what I often try to emphasize is the thoughtful reporting of results, and the ethical implications of a deployment,” she says. “In a technical setting, often we already emphasize really rigorous evaluation and choosing an appropriate problem.”
That approach can shift.
“If somebody has a more clinical background, and already lives and breathes the ethical implications of what they’re doing, I would emphasize the other facets,” Ghassemi says.
“Especially with the availability of downloadable models, the goal should be to ensure that the technical solution you come up with is useful across different patients – that it’s possible to generalize it to your setting and problem.”