In this series of blogs, we have shifted our discussion on the beneficial uses of Artificial Intelligence (AI) to an overview of the concerns and vulnerabilities that AI could bring, specifically to healthcare, if not carefully and ethically implemented.
As we have previously documented, Artificial intelligence (AI) is rapidly transforming the healthcare industry. AI-driven applications and programs are already being used to diagnose diseases, develop new treatments, and deliver care more efficiently. However, there are also several potential risks associated with the use of AI in healthcare.
In this blog, we want to drill to a more detailed level from our last blog and identify what we believe are the most critical areas that must be observed in the design, development, and application of any AI program for healthcare. These areas are:
- ACCURACY, CONSISTENCY AND RELIABILITY – It seems obvious that accuracy, consistency, and reliability would be minimal standards for an AI healthcare platform; however, experience tells us that unless these are front and center, they will likely be missed. What do these mean to an AI healthcare platform:
- Accuracy – in the case of AI applications, accuracy is the number of correct predictions. Much like a grade on a paper in school you get graded for the right answer not the close answer. AI applications need to be correct, but what accuracy is acceptable? Your school test report was usually “passing” if you scored 70%, but is that acceptable accuracy for an AI application that will be a part of a physician treatment plan for you or a loved one? Is missing it 30% of the time acceptable? No, not to me it’s not!
In a recent, not-yet-peer-reviewed study, researchers at both Stanford and Berkeley found a significant change in the “behavior” of responses from ChatCPT-3.5 and ChatGPT-4 with both appearing to go down, i.e., increasingly less accurate. This appears to validate user reports about the apparent degradation in the latest versions of the software in just a few short months after release.
So, how can we ensure the accuracy of the AI platforms predictions, findings, or suggested diagnoses as to the underlying cause? How do we improve the accuracy of anything? There are a few tried and true methods to achieve this:
- Continuously update the information / knowledge base that the platform is accessing. It is not possible to have too much information.
- Utilize comparative studies of individuals (de-identified) with similar demographics, history, conditions and their outcomes or other forms of benchmarking the results in non-AI driven ways.
- If in doubt, and if using a generative AI program, then reengineer the prompts and ask the questions again – somewhat like a detective asking the same question in different ways. This causes the platform to ‘re-analyze’ the problem and may result in the selection of different data sets that could lead to a different diagnosis. Most of us have been there – two physicians, equally qualified with a different diagnosis of a condition – so, we go for the tiebreaker.
- Realize that there are NO ABSOLUTES! Just because it’s a computer-generated result, it should never imply accuracy – Garbage In / Garbage Out but it may not appear as garbage at first!! Unfortunately, many of us have been conditioned to accept that computer-generated results are accurate – they are not because they are 100% dependent on the data provided!
So far, we are only looking at an AI program to assist in diagnosis and not take over – fears of a ‘takeover’ within the medical community by AI-driven robots is misplaced! After all, this did not happen in Star Trek, ‘Bones’ would not have allowed it and, according to some, Star Trek is a bellwether for future technology!
- Consistency – in the case of the AI healthcare program, this characteristic means that the program or platform is operating on the assigned data set only, or the patient only, or the type of patient only. This characteristic leads to reproducibility which, as it implies, provides a “consistent” result when given the same data and parameters.
A helpful analogy is the Checksum or Cyclical Redundancy Check (CRC) that is an error detecting code that is still used to validate that a disk drive is good. If the disk passed the CRC, it was considered good to use and data could be saved with minimal risk of it being immediately corrupted.
- Reliability – the degree to which a result can be depended on to be accurate. For AI programs, reliability is the probability that the program will perform accurately and consistently for a specified period within respect to a unique set of circumstances. This relates to software quality and is less ‘design’ oriented and more ‘operation’ oriented. The ability to build an AI program that meets the standard software quality measures is not a difficult task; however, the difficulty will arise in assuring that other factors such as data integrity and quality are maintained which will have a measurable effect on the outcome.
The variables with an AI program to ensure accuracy, consistency and reliability are not substantially different than other software programs for uses other than healthcare. However, these three (3) characteristics are so interdependent that the degradation in anyone will cause a degradation across all and where a human life may be involved lead to a reduction in the quality of care that could lead to serious consequences for the patient depending on the application.
- BIAS – a bias is any ‘prejudice in favor of or against one thing, person, or group compared with another.’ In our application, bias means that there is a systematic distortion of the result whether intended or unintended, known, or unknown. Recently, a university was cited for discrimination because their AI enrollment program was unfairly favoring one gender over another. Whether an intended consequence or not, it was a bias that should not have existed but was introduced into the selection process by the algorithms in the AI program.
So, what happens if a bias is introduced unintentionally into an AI program used in healthcare. Much has been written about the generative AI programs that are responsive to ‘prompts’ to provide answers; however, it is quite possible that bias can be created in the wording of the prompt with the result being used for delivery of care to a patient! We will discuss a variety of biases that can be introduced via AI into the healthcare diagnostic / prognosis process.
Let’s briefly explore several types of bias that can be inadvertently built into the algorithms and how these can be avoided. AI systems are trained on data, and if that data is biased, then the AI system will be biased as well. This could lead to discrimination against certain groups of people, such as racial minorities or women.
For example, in 2018, an AI-powered loan application system, developed by Amazon, was found to be biased against women. The system was more likely to approve loan applications from men than from women, even when the women had the same qualifications as the men.
Here are some of the biases that must be overcome or guarded against in an artificial intelligence program written to provide physician diagnostic support:
- Selection bias
A selection bias occurs when the data used to train an AI system is not representative of the population that the system will be used to diagnose. This can lead to the AI system making inaccurate or biased predictions.
For example, if an AI system is trained on a dataset of patients who are all white, the system may be less accurate in diagnosing patients of other races.
- Performance bias
A performance bias occurs when the performance of an AI system is evaluated on a dataset that is different from the data used to train the system. This can lead to the AI system appearing to be more accurate than it is.
For example, if an AI system is trained on a dataset of patients who are all in the early stages of a disease, the system may appear to be very accurate at diagnosing the disease. However, if the system is then evaluated on a dataset of patients who are all in the late stages of the disease, the system may not be as accurate.
- Detection bias
A detection bias occurs when the way that a disease is detected is different for different groups of people. This can lead to the AI system making inaccurate or biased predictions.
For example, if a disease is more likely to be detected in people of a certain race, the AI system may be more likely to diagnose that disease in people of that race, even if they are not actually sick. This is a form of the AI program creating its own bias.
- Attrition bias
An attrition bias occurs when people who are more likely to have a certain disease are more likely to drop out of a study. This can lead to the AI system being trained on a dataset that is not representative of the population that the system will be used to diagnose.
For example, if people who are sick are more likely to drop out of a study, the AI system may be trained on a dataset that is biased towards healthy people. This can lead to the AI system being less accurate at diagnosing sick people.
- Reporting bias
A reporting bias occurs when people are more likely to report having a certain disease than they do. This can lead to the AI system being trained on a dataset that is biased towards people who have the disease.
For example, if people are more likely to report having a disease if they are seeking treatment, the AI system may be trained on a dataset that is biased towards people who are sick. This can lead to the AI system being less accurate at diagnosing people who are not sick.
These are just a few of the areas of possible bias that must be overcome or guarded against in an artificial intelligence program written for physician support. By being aware of these biases, developers can take steps to mitigate their effects and ensure that the AI system is as accurate, consistent, reliable, and unbiased as possible.
Next week, we will continue our discussion on what we believe are the most critical areas that must be observed in the design, development, and application of any AI program for healthcare.
– Carl L. Larsen, President & Chief Operating Officer of OXIO Health, Inc.