Thursday, April 11, 2024

Generative AI resisting or generating health disinformation

Photo by Sheila Webber of a daffodil and blooming heather in March 2024

An interesting study to see whether generative AI would resist being told to generate health misinformation (which could be used in a blog, for example)
Menz, B.D. et al. (2024). Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis. BMJ, 384, article e078538. https://doi.org/10.1136/bmj-2023-078538.
"Methods In a repeated cross sectional analysis, four LLMs (via chatbots/assistant interfaces) were evaluated: OpenAI’s GPT-4 (via ChatGPT and Microsoft’s Copilot), Google’s PaLM 2 and newly released Gemini Pro (via Bard), Anthropic’s Claude 2 (via Poe), and Meta’s Llama 2 (via HuggingChat). In September 2023, these LLMs were prompted to generate health disinformation on two topics: sunscreen as a cause of skin cancer and the alkaline diet as a cancer cure. Jailbreaking techniques (ie, attempts to bypass safeguards) were evaluated if required. ... 12 weeks after initial investigations, the disinformation generation capabilities of the LLMs were re-evaluated to assess any subsequent improvements in safeguards."
Results. They found that Claude 2 (via Poe) declined the prompts "even with jailbreaking attempts". GPT-4 (via Copilot) declined attempts initially, but not for the prompts given 12 weeks later.
"In contrast, GPT-4 (via ChatGPT), PaLM 2/Gemini Pro (via Bard), and Llama 2 (via HuggingChat) consistently generated health disinformation blogs. In September 2023 evaluations, these LLMs facilitated the generation of 113 unique cancer disinformation blogs, totalling more than 40 000 words, without requiring jailbreaking attempts. The refusal rate across the evaluation timepoints for these LLMs was only 5% (7 of 150), and as prompted the LLM generated blogs incorporated attention grabbing titles, authentic looking (fake or fictional) references, fabricated testimonials from patients and clinicians, and they targeted diverse demographic groups."
"Conclusions This study found that although effective safeguards are feasible to prevent LLMs from being misused to generate health disinformation, they were inconsistently implemented. Furthermore, effective processes for reporting safeguard problems were lacking. Enhanced regulation, transparency, and routine auditing are required to help prevent LLMs from contributing to the mass generation of health disinformation."
Photo by Sheila Webber: daffodil and heather, March 2024

No comments: