By now, health systems seeking to capitalize on the enormous potential of artificial intelligence are well aware – or should be, at least – of the inherent risks, even dangers, of algorithms and models that are suboptimally designed or trained on the wrong data.
But understanding the hazards of algorithmic bias or murky modeling techniques isn’t the same as knowing how to protect against them.
How can healthcare providers know how to spot biased black box systems? How should they mitigate the risks of training algorithms on the wrong datasets? How can they build an ethical and equitable AI culture that prioritizes transparency and trustworthiness?
At the HIMSS24 AI in Healthcare Forum on March 11, an afternoon panel discussion will tackle those questions and more.
The session, The Quest for Responsible AI, is set to be moderated by HIMSS director of clinical research Anne Snowdon and features two leading thinkers about artificial intelligence in healthcare: Michael J. Pencina, chief data scientist at Duke, director of Duke AI Health and professor of bioinformatics at the Duke School of Medicine; and Brian Anderson, the former chief digital health physician at MITRE, who was just announced as the new CEO of the Coalition for Health AI, which he cofounded.
In advance of HIMSS24, we spoke with Anderson about the imperatives of AI transparency, accountability and data privacy, and how healthcare organizations can prioritize them and act on them as they integrate AI more tightly into their care delivery.
Q. What are some of the biggest ethics or responsibility challenges around AI’s role in healthcare, as you see them?
A. Part of the challenge starts at a very high level. All of us are patients or caregivers at one point in our life. Healthcare is a highly consequential space. Artificial intelligence is essentially tools and programs that are trained on our histories. And the data that we have to train these programs [can be] fundamentally biased in some pretty noticeable ways. The most accessible kinds of data, the most robust data, oftentimes comes from big academic medical centers that are really well staffed, that have the ability to create these robust, complete datasets.
And it’s often on urban, highly educated, more often white than not, kinds of people. The challenge, then, is how do we build fair models that do not have an unjustified bias to them. And there are no easy answers to that. I think it takes a coordinated approach across the digital health ecosystem in terms of how we invest and think about intentionally partnering with communities that haven’t been able to tell their story from a digital perspective to create the datasets that can be used for training purposes.
And it opens up some other challenges around how we think about privacy and security, how we think about ensuring that all this data that we’re looking to connect together is actually going to be used to help the communities that it comes from.
And yet, on the flip side of this, we have this great promise of AI: That it’s going to enable people that traditionally don’t have easy access to healthcare to be able to have access to patient navigator tools. To be able to have an advocate that, as an example, might be able to go around helping you navigate and interact with providers, advocating for your priorities, your health, your needs.
So I think there’s a lot of exciting opportunities in the AI space. Obviously. But there are some real challenges in front of us that we need to, I think, be very real about. And it starts with those three issues: All of us are going to be patients or caregivers at one point in our life. All these algorithms are are programs that are trained on our histories, and we have a real big data problem in terms of the biases that are inherent in the data that is, for the most part, the most robust and accessible for training purposes.
Q. How then should health systems approach the challenge of building transparency and accountability from the ground up?
A. With the Coalition for Health AI, the approach that we’ve taken is looking at a model’s lifecycle. A model is developed initially, it’s deployed and it’s monitored or maintained. In each one of those phases, there are certain considerations that you need to really focus on and address. So we’ve talked about having data that is engineered and available to appropriately train these models in the development phase.
If I’m a doctor at a health system, how do I know if a model that is configured in my EHR is the appropriate model? If it’s fit for purpose for the patient I have in front of me? There are so many things that go into being able to answer those questions completely.
One is, does the doctor even understand some of the responsible AI best practices? Does the doctor understand what it means to look critically at the AI’s model card? What do I look for in the training data? What do I look for in the approach to training? In the testing data? Were there specific indications that were tested? Are there any indications or limitations that are called out, like, don’t use it on this kind of patient?
Those are really important things. When we think about the workflow and the clinical integration of these tools, simply having pop-up alerts is an [insufficient] way of thinking about it.
And, particularly in some of these consequential spaces where AI is becoming more and more used, we really need to upskill our providers. And so having intentional efforts at health systems that train providers on how to think critically about when and when not to use these tools for the patients they have in front of them is going to be a really important step.
You bring up another good point, which is, “OK, I’m a health system. I have a model deployed. Now what?’
So you’ve upskilled your doctors, but AI, as you know, is dynamic. It changes. There’s performance degradation, there’s model drift, data drift.
I would say one of the more unanswered questions is the one you’re bringing up, which is: Health systems, the majority of them are in the red. And so to go to them and say, “OK, you’ve just bought this multimillion-dollar AI tool. Now you have to stand up a governance committee that’s going to monitor that and have another suite of digital tools that are going to be your dashboards for monitoring that model.” If I were a health system, I would run for the hills.
So we don’t have yet a scalable plan as a nation in terms of how we’re going to support critical access hospitals or FQHCs or health systems that are less resourced, that don’t have the ability to stand up these governance committees or these very fancy dashboards that are going to be monitoring for model drift and performance.
And the concern I have is that, because of that, we’re going to go down the same path that we’ve gone down with many of the other kinds of advances we’ve had in health, particularly in digital health, which is just a reinforcing of the digital divide in health systems: Those that can afford to put those things in place do it, and those that don’t, they would be irresponsible if they were to try to purchase one of these models and not be able to govern it or monitor it appropriately.
And so some of the things that we’re trying to do in CHAI are identify what are the easily deployable tools and toolkits – Smart on FHIR apps, as an example – who are the partners in the platform space, a Microsoft, a Google cloud or an AWS that can build the kind of tools that can be more scalable and more easily deployed for health systems that are on any one of these cloud providers to be able to use them more easily, in perhaps a remote way?
Or how can we link assurance labs that are willing to partner with lesser-resourced health systems to do remote assurance, remote monitoring of locally deployed models?
And so it’s this balance, I think, of enabling health systems to do it locally, while also enabling external partners – be it platform vendors or other assurance lab experts – to be able to, in this cloud interoperable world that we live in, to be able to help in perhaps a more remote setting.
Q. Congratulations, by the way, on your new CEO position at the Coalition for Health AI. What has been front and center for CHAI recently, and what are you expecting to be talking about with other HIMSS24 attendees as you walk around the convention center next week?
A. I would say, and this goes for MITRE, too, the thing that has been front and center at MITRE and at CHAI, we have this amazing new set of emerging capabilities that are coming out in generative AI. And the challenge has been coming to agreement on how do you measure performance in these models?
What does accuracy look like in a large language model’s output? What does reliability look like in a large language model’s output, where the same prompt can yield two different responses? What does measuring bias look like in measuring the output of one of these large language models? How do you do that in a scalable way? We don’t have consensus perspectives on these important fundamental things.
You can’t manage what you can’t measure. And if we don’t have agreement on how to measure, a pretty consequential space that people are beginning to explore with generative AI is going unanswered. We urgently need to come to an understanding about what those testing and evaluation frameworks are for generative AI, because that then informs a lot of the regulatory work that’s going on in this space.
That’s perhaps the more urgent thing that we’re looking at. I would say it’s something that MITRE has been focused on for quite some time. When we look at the non-health-related spaces, a lot of the expertise that our team, the MITRE team, brought to CHAI was informed by a lot of the work going on in different sectors.
And so I know that in healthcare, we’re used to other sectors telling us, “I can’t believe that you haven’t done X or Y or Z yet.” Or, like, ‘You’re still using faxes? How backward are you in healthcare?”
I would say similarly in this space, we have a lot to learn from other sectors that have been explored, like how we think about computer vision, algorithms and the generative AI capabilities in other domains beyond health that will help us get to some of these answers more quickly.
Q. What else are you hoping to learn next week in Orlando?
A. I think one of the things that I’m really excited about – again, it’s something that I learned at MITRE – which is the power of public private partnerships. And I would never want to speak for the U.S. government or the FDA, and I won’t here, but I would say, I think one of the things I’m really excited about – and I don’t know how this is going to play out – but is seeing how the U.S. government is going to be participating in some of these working groups that we’re going to be launching on our webinar next week.
You’re going to get leading technical experts in the field from the private sector, working alongside folks from the FDA, ONC, Office for Civil Rights, CDC. And what comes out of that, I hope, is something beautiful and amazing, and it’s something that we as society can use.
But I don’t know what it’s going to look like. Because we haven’t done it yet. We’re going to start doing that work now. And so I’m really excited about it. I don’t know exactly what it’s going to be, but I’m pretty excited to see kind of where the government goes, where the private sector teams go when they start working together, elbow-to-elbow.
The session, “The Quest for Responsible AI: Navigating Key Ethical Considerations,” is scheduled for the preconference AI in Healthcare Forum, on Monday, March 11, 1-1:45 p.m. in Hall F (WF3) at HIMSS24 in Orlando. Learn more and register.