“Accelerating AI: Surpassing Human Capabilities in Language and Reasoning

The current best estimates of the rate of improvement in Large Language Models

show capabilities doubling every 5 to 14 months.

Since the time of one frontier model, GPT-4, there are now three GPT-4 class models. Claude 3, GPT-4, and Google’s Gemini (confusingly Gemini 1.0 Advanced and Gemini 1.5 Pro are both GPT-4 class models, but Gemini 1.0 Pro is not). While the release of the first models to clearly surpass GPT-4 are expected to occur in the coming months, one thing we have learned is that we have not come close to exhausting the abilities of even existing AIs. One Useful Thing

An individual experienced difficulty getting an AI to solve the New York Times crossword. After expressing the belief that it was a challenging problem, Prof. Arvind Narayanan demonstrated that it was achievable with GPT-4.

Well, @random_walker proved the point that he often makes: it is really hard to rule out what LLMs can or can’t do.

Good prompting can get the AI to solve problems that looked impossible. Or, alternately, it is easy to be fooled by the AI seeming to solve problems it didn’t, https://t.co/4SrK5UgEPb pic.twitter.com/UWIcTnyoHb

— Ethan Mollick (@emollick) October 6, 2023

Indeed, even the creators of Large Language Models (LLMs) are not fully aware of the capabilities of these systems. For instance, a three-page prompt was developed that transforms GPT-4 or Claude 3 into a negotiation simulator featuring grading and integrated lessons. More educational prompts are expected to be released soon in a new paper. However, this particular prompt and previous ones are available in the prompt library under a Creative Commons license and can be tried out as a GPT. This prompt and others have been shown to individuals within major AI labs who were unaware that their models could facilitate such sophisticated interactions based solely on a prompt.

In a randomized, controlled, pre-registered study GPT-4 was better able to change people’s minds during a conversational debate than other humans, at least when it is given access to personal information about the person it is debating (people given the same information were not more persuasive). The effects were significant: the AI increased the chance of someone changing their mind by 87% over a human debater. This might be why a second paper found that GPT-4 could accomplish a famously difficult conversational task: arguing with conspiracy theorists. That controlled trial found that a three-round debate, with GPT-4 arguing the other side, robustly lowers conspiracy theory beliefs. Even more surprisingly, the effects persist over time, even for true believers. One useful thing.

A new study in JAMA Internal Medicine finds that AI was better than physicians in processing medical data and clinical reasoning on real patient cases. See more info here:

AI Outperforms Doctors in Summarizing Health Records:

In a recent study in Nature Medicine, an international team of scientists identified the best large language models and adaptation methods for clinically summarizing large amounts of electronic health record data. They compared the performance of these models to that of medical experts.
The study found that adapted large language models can outperform medical experts in clinical text summarization

ChatGPT vs. Physicians in Providing Advice:

A study led by Dr. John W. Ayers from the Qualcomm Institute at the University of California San Diego compared written responses from physicians with those from ChatGPT (an AI assistant) to real-world health questions.
The panel of licensed healthcare professionals preferred ChatGPT’s responses 79% of the time, rating them as higher quality and more empathetic. The study highlights the potential role of AI assistants in medicine

It’s crucial to take note of this finding, as it’s likely to be echoed widely: capabilities once deemed exclusive to humans are increasingly being replicated by machines, sometimes surpassing human abilities. It’s essential to anticipate this shift while also remaining vigilant about the potential biases and shortcomings that may lurk beneath the surface of these systems’ remarkable feats. For instance, GPT-4 exhibits numerous biases in hiring evaluations. Fields that proactively adapt to this evolving landscape, such as the introspective approach taken by data science educators, will fare better than those that disregard the impending transformations.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

The current best estimates of the rate of improvement in Large Language Models

Like this:

Related

I would love to hear from youCancel reply

The current best estimates of the rate of improvement in Large Language Models

Share this:

Like this:

Related

I would love to hear from youCancel reply

Discover more from The digital classroom, transforming the way we learn