ChatGPT, Artificial Intelligence and the NHS

By Omar Musbahi
Trauma and Orthopaedic Registrar, NIHR Academic Clinical Fellow, Imperial College NHS Trust

As any orthopaedic surgical trainee will know, being asked questions during trauma meetings can often be the most challenging part of an on-call. Often, I would spend parts of my on-call shift reading the confusing classification systems and surgical approaches.

With a pocket sized 'Rockwood & Green' and 'Orthobullets' in my Iphone favourites tab, I would trawl through the extra modifications to the Salter Harris classification and be ready for the intense quiz of the trauma meetings that I knew was sure to come.

At the time, I believed that medicine was all about memory and clinicians experience. Surely there is no computer that can replicate this.

Less than five years later, medicine is entering a new era, one in which artificial intelligence (AI) promises to interpret our scans, write our notes, and help with decision making. Unless you have been hiding, there is no escaping the headlines 'Artificial Intelligence can diagnose breast cancer better than doctors' or 'Artificial intelligence will make brain surgery safer'.  

The potential appears to be limitless, and I thought nothing can surprise me further with the potential of AI, until I subscribed to ChatGPT in late 2022.



The idea of a talking computer that can think and learn from itself has been the goal since Alan Turing first described the concept of AI in his seminal paper “Computing Machinery and Intelligence”. ChatGPT and Bard are both Large Language Models (LLMs) that use an AI technique called deep learning to generate text. LLMs are trained on massive datasets of text and code, and they learn to predict the next word in a sequence based on the words that have come before it. Imagine having a really big and smart dictionary combined with a super talented storyteller. These models can read, write, and chat with people by processing text and generating responses that make sense. They learn from lots of books, websites, and other texts to become better at understanding and talking like humans. It's kind of like teaching a computer to have a conversation with you, answer your questions, and even write essays or stories. For scientists and computer programmers, this has not been previously possible until the release of the landmark paper by Vaswani et al 20173.  ChatGPT and Google Bard have become popular because they can help with tasks like answering questions, translating languages, and even creating content.

In an age of increasing problems for the NHS, LLMs potentially solve some ongoing issues. When I asked ChatGPT to provide a witty response to how it can solve NHS problems, this is the answer it provided:

"Need to fix the NHS? ChatGPT's got the prescription for success! Think of it as the NHS's virtual BFF (Best Friend Forever), always ready to lend a hand. It can help doctors by sifting through mountains of medical data faster than a caffeine-fueled superhero, finding patterns, and suggesting treatment options. It's like having a medical encyclopedia, an experienced consultant, and a research assistant rolled into one witty digital sidekick. Plus, it can answer patients' questions, schedule appointments, and provide reliable health information 24/7, taking some pressure off the NHS's busy phone lines. With ChatGPT, the NHS is not just treating patients; it's also treating itself to a dose of efficiency and innovation!" (ChatGPT et al, 2023).

Not short of confidence, is it? Despite the bravado, studies have backed up some of these claims. Probably the most interesting of which, is a study looking at the notoriously difficult New England Journal of Medicine weekly cases — and found that ChatGPT offered the correct diagnosis in a list of possible diagnoses just over 60 percent of the time1. This performance is (probably) better than any medic can possibly provide.

From my (limited) knowledge of AI research, there are three questions that I ask myself when reading an AI paper: model selection, quality of data and validation of data. Now this is where there are issues with ChatGPT. No one can determine what the source of data is, perhaps ChatGPT will divulge its secret:

“It has been trained on a diverse and extensive dataset to develop its language understanding and generation capabilities. However, OpenAI has not publicly disclosed the specifics of the training data, including the individual sources or the proportion of data from each source. This is done to protect the privacy and copyright of the content in the training data and to prevent potential biases from being amplified. The knowledge cutoff for ChatGPT is in September 2021, which means it may not be aware of events or developments that occurred after that date”.  (ChatGPT et al, 2023).

So, the mystery continues. The source of data is just part of the concerns. Privacy, ethics and bias in AI have also been longstanding issues in AI models. It is no wonder that Sam Altman (CEO of OpenAI, creators of ChatGPT) stood in front of congress to say that he has many concerns including risk of bias, misuse and lack of accountability.

This can be a problem when integrating LLMs into the NHS, where they could be used to diagnose diseases, recommend treatments, or analyse patient data. If an LLM is biased against certain groups of people, it could lead to misdiagnoses, inadequate treatment plans, or unequal access to care.

Indeed, a recent systematic review of 120 research papers has unveiled a concerning truth about AI and machine learning (ML) – gender bias is prevalent in these technologies2. Nearly half of the papers in the study discussed the presence of gender and racial bias across various AI applications. Notably, Natural Language Processing (NLP) systems, responsible for text analysis and sentiment analysis, revealed deep-seated biases originating from the languages they are trained on. These biases have far-reaching consequences, impacting language translation and hate speech detection. The study also emphasised the intersectional nature of these biases, disproportionately affecting women of ethnic origin2. This research underscores the urgent need for a thorough re-evaluation of AI and ML algorithms to prevent the perpetuation of biases and inequalities. Furthermore, the study highlighted the lack of gender representation in audio-visual media, which can hinder progress towards gender equality by reinforcing stereotypes and limiting the visibility of underrepresented groups. In essence, these findings further reinforce concerns regarding integration LLM into healthcare settings, emphasising the importance of transparency in AI development for a more equitable healthcare system.

Is the bias insurmountable and a large limiting factor for the integration of Large Language Models into the NHS? It is hard to tell. Before we integrate LLMs or any form of AI into the NHS, we need to find a way to assess bias and to de-bias them.

The two most pressing issues facing the NHS currently and in the future are staffing shortages and the long elective waiting lists. There is a myriad of ways that LLMs feels that it can help with these issues:

  1. Information and Education: ChatGPT can provide up-to-date information about the current status of NHS waiting lists, the reasons behind the crisis, and the impact of doctor strikes on healthcare services. It can also explain the broader context of healthcare systems and their challenges.
  2. Guidance on Accessing Healthcare: ChatGPT can offer advice on how individuals can navigate the NHS and access healthcare services during this crisis. This might include information on how to prioritise care, find alternative resources, or understand the triage process.
  3. Policy and Reform: It can provide insights into potential policy solutions and reforms that have been proposed to address the NHS crisis, such as increasing funding, improving efficiency, or reforming workforce management.
  4. Emotional Support: ChatGPT can offer emotional support and suggestions for coping with the stress and anxiety that may arise from the healthcare crisis, especially for those experiencing long waiting times or disruptions in care due to strikes.
  5. Advocacy and Communication: It can provide guidance on how individuals and communities can advocate for change, such as contacting their local representatives or participating in healthcare advocacy groups.
  6. Understanding Healthcare Rights: ChatGPT can explain patients' rights within the NHS, including their right to timely access to healthcare and how to file complaints or appeals if they believe their rights are not being upheld.
  7. Resource Referrals: It can provide information on relevant organisations, hotlines, or websites that offer additional support or information related to NHS issues. ChatGPT et al, 2023.

It is clear that ChatGPT has not realised its own potential or its being rather coy… Perhaps the role of ChatGPT in the NHS for patients and surgeons will only be known with time once all the concerns surrounding privacy and bias have been addressed.

As someone fascinated by innovation and health technology, it will be interesting to see if LLMs will withstand the longevity-(circa Scott's Parabola). Perhaps the future doctor or orthopaedic surgeon is one that utilises AI capabilities as an adjunct or to perform the more laborious manual tasks to streamline patient care to circumvent NHS inefficiencies.

In the meantime, I have downloaded the ChatGPT app, and next time I am preparing for a trauma meeting, I might just use the app and see what it says.

  1. Kanjee Z, Crowe B, Rodman A. Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge. JAMA. 2003;330:78-80.
  2. Shrestha S, Das, S. Exploring gender biases in ML and AI academic research through systematic literature review. Front Artif Intell. 2022;5:976838.
  3. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in neural information processing systems. 2017;30.