The current best estimates of the rate of improvement in Large Language Models
show capabilities doubling every 5 to 14 months.
Since the time of one frontier model, GPT-4, there are now three GPT-4 class models. Claude 3, GPT-4, and Google’s Gemini (confusingly Gemini 1.0 Advanced and Gemini 1.5 Pro are both GPT-4 class models, but Gemini 1.0 Pro is not). While the release of the first models to clearly surpass GPT-4 are expected to occur in the coming months, one thing we have learned is that we have not come close to exhausting the abilities of even existing AIs. One Useful Thing
An individual experienced difficulty getting an AI to solve the New York Times crossword. After expressing the belief that it was a challenging problem, Prof. Arvind Narayanan demonstrated that it was achievable with GPT-4.
Well, @random_walker proved the point that he often makes: it is really hard to rule out what LLMs can or can’t do.
Good prompting can get the AI to solve problems that looked impossible. Or, alternately, it is easy to be fooled by the AI seeming to solve problems it didn’t, https://t.co/4SrK5UgEPb pic.twitter.com/UWIcTnyoHb
— Ethan Mollick (@emollick) October 6, 2023
Indeed, even the creators of Large Language Models (LLMs) are not fully aware of the capabilities of these systems. For instance, a three-page prompt was developed that transforms GPT-4 or Claude 3 into a negotiation simulator featuring grading and integrated lessons. More educational prompts are expected to be released soon in a new paper. However, this particular prompt and previous ones are available in the prompt library under a Creative Commons license and can be tried out as a GPT. This prompt and others have been shown to individuals within major AI labs who were unaware that their models could facilitate such sophisticated interactions based solely on a prompt.

In a randomized, controlled, pre-registered study GPT-4 was better able to change people’s minds during a conversational debate than other humans, at least when it is given access to personal information about the person it is debating (people given the same information were not more persuasive). The effects were significant: the AI increased the chance of someone changing their mind by 87% over a human debater. This might be why a second paper found that GPT-4 could accomplish a famously difficult conversational task: arguing with conspiracy theorists. That controlled trial found that a three-round debate, with GPT-4 arguing the other side, robustly lowers conspiracy theory beliefs. Even more surprisingly, the effects persist over time, even for true believers. One useful thing.
A new study in JAMA Internal Medicine finds that AI was better than physicians in processing medical data and clinical reasoning on real patient cases. See more info here:
AI Outperforms Doctors in Summarizing Health Records:
- In a recent study in Nature Medicine, an international team of scientists identified the best large language models and adaptation methods for clinically summarizing large amounts of electronic health record data. They compared the performance of these models to that of medical experts.
- The study found that adapted large language models can outperform medical experts in clinical text summarization
ChatGPT vs. Physicians in Providing Advice:
- A study led by Dr. John W. Ayers from the Qualcomm Institute at the University of California San Diego compared written responses from physicians with those from ChatGPT (an AI assistant) to real-world health questions.
- The panel of licensed healthcare professionals preferred ChatGPT’s responses 79% of the time, rating them as higher quality and more empathetic. The study highlights the potential role of AI assistants in medicine