Is the AI behind ChatGPT getting worse?

imhotep · Jul 25, 2023

The AI that powers ChatGPT appears to be performing less well at mathematical problems than it was just a few months ago.

The AI powering ChatGPT may provide completely different answers to the same mathematical problems over time. Those findings from recent experiments have fuelled an ongoing debate about whether the AI chatbot’s performance is getting worse – and have spurred the firm behind it, OpenAI, to reassure customers that applications built on ChatGPT will not continually break.

Last week, a study emerged from a collaboration between Stanford University and UC Berkeley which was published in the ArXiv preprint archive and highlighted noticeable differences in the responses of GPT-4 and its predecessor, GPT-3.5, over a span of a few months since the former’s March 13 debut.

One of the most striking findings was GPT-4’s reduced accuracy in answering complex mathematical questions. For instance, while the model demonstrated a high success rate (97.6 percent) in answering queries about large-scale prime numbers in March, its accuracy in answering that same prompt correctly plummeted to a mere 2.4 percent in June.

The study also pointed out that, while older versions of the bot offered detailed explanations for their answers, the latest iterations seemed more reticent, often forgoing step-by-step solutions even when explicitly prompted. Interestingly, during the same period, GPT-3.5 showed improved capabilities in addressing basic math problems, though it still struggled with more intricate code generation tasks.
The research team also delved into GPT-4’s coding capabilities, which appeared to have regressed. When the model was tested using problems from the online learning platform LeetCode, only 10 percent of the generated code adhered to the platform’s guidelines. This marked a significant drop from a 50 percent success rate observed in March.

OpenAI’s approach to updating and fine-tuning its models has always been somewhat enigmatic, leaving users and researchers to speculate about the changes made behind the scenes. With global concerns and ongoing legislation in the works surrounding AI regulation and its ethical use, transparency is increasingly on the minds of government regulators and even everyday users of the AI-based tech products that are emerging ever-more frequently.

While the model’s responses seemed to lack the depth and rationale observed in earlier versions, the recent study did note some positive developments: GPT-4 demonstrated enhanced resistance to certain types of attacks and showed a reduced propensity to respond to harmful prompts.

Rapunzel003 · Jul 25, 2023

RealityOfX · Jul 25, 2023

Every bubble has to pop

MrFrog · Jul 25, 2023

training data සෙට් එකේ accuracy and quality එක කොහොමද දන්නේ නෑ validate කරන්නේ නේද? ඔය app එක ටෙස්ට් කරන්නම මිනිස්සු දාපු data නිසා garbage in garbage out වගේද දන්නෙ නෑ. But again it seems unlikely that this reason will cause a problem related to mathematics and calculations such as large-scale prime numbers..

shenat · Jul 25, 2023

මුලින් දීපු ඩේටා සෙටි එකෙන් පස්සෙ දෙන ඩේටා වලින් ඕක ට්‍රේන් වෙන්නෙ නැත්තං මුලින් දීපු උත්තරයි දැන් උත්තරයි ඔච්චර වෙනස් වෙන්නෙ කොහොමද??

imhotep · Jul 25, 2023

shenat said:
මුලින් දීපු ඩේටා සෙටි එකෙන් පස්සෙ දෙන ඩේටා වලින් ඕක ට්‍රේන් වෙන්නෙ නැත්තං මුලින් දීපු උත්තරයි දැන් උත්තරයි ඔච්චර වෙනස් වෙන්නෙ කොහොමද??

Peter Welinder, OpenAI’s VP of Product, addressed the concerns of the public more than a week before the study was released, stating that GPT-4 has not been “dumbed down.” He suggested that as more users engage with ChatGPT, they might become more attuned to its limitations.

Walter White · Jul 25, 2023

Hm.. Bumpi

MrFrog · Jul 25, 2023

imhotep said:
Peter Welinder, OpenAI’s VP of Product, addressed the concerns of the public more than a week before the study was released, stating that GPT-4 has not been “dumbed down.” He suggested that as more users engage with ChatGPT, they might become more attuned to its limitations.

But still 97.6% to 2.4%

imhotep · Jul 25, 2023

MrFrog said:
But still 97.6% to 2.4%

Neither the study nor his explanation sheds any light on the matter. The study itself raises many questions than answers.

jdchathuranga · Jul 25, 2023

AFAIK, now they have lot of filters, for nsfw content, harmful instructions etc etc, those might be the reason for that dumbed down perspective, not the actual responses.
however i think we need unsensored, unfiltered AI, as i strongly believe knowledge shouldn’t be filtered or sensored, even if it is harmful or bad.

DHE · Jul 25, 2023

math oka dakka
GPT-3 tharam wath dan perform karanne na

anith eka chat ekak deep wenna deep wenna pissu natanawa
( eka digata ekama warge maths quiz ekama chat eke type karagena yaddi anthima weddi accuracy eka adu wenw. kiyana eka newei karanne malle pol)
habai mama GPT-3.5 use kare

NRTG · Jul 25, 2023

TFS

Gwynbleidd · Jul 25, 2023

I read a research stating previously it was able to solve a math problem in 86+% accuracy and over time the same solving accuracy fell to 70 or 50 or something. Title was the same in that article.

topkollek · Jul 25, 2023

I noted some improvement in v 4.0 math over 3.x.

Here is the question I asked.

I have a bag containing 3 red balls and 4 blue balls. If I pick two balls without replacement, what is the probability of picking two red balls?

Answer given was completely wrong in 3.x.

kolavari · Jul 25, 2023

Awareness da nattham bug ekak danne na

MemoryHacker · Jul 25, 2023

A recent study by Stanford University and UC Berkeley compared the performance of GPT-4 and its predecessor GPT-3.5 over a few months since GPT-4's release. They found that GPT-4's accuracy in answering complex mathematical questions significantly dropped from 97.6% to 2.4% between March and June. Moreover, GPT-4 was less inclined to provide detailed explanations for answers, even when prompted. On the other hand, GPT-3.5 showed improvements in basic math but struggled with complex code generation. GPT-4's coding capabilities also regressed, with only 10% of generated code adhering to platform guidelines compared to 50% in March. OpenAI's update process remains somewhat mysterious, raising concerns about transparency amid growing global concerns and regulations surrounding AI ethics. Despite some shortcomings, GPT-4 showed improved resistance to certain types of attacks and avoided responding to harmful prompts.

Ad:

wingman · Jul 25, 2023

kalinuth pacha keliye bn mu.

mathas and logics questions walata amu boru denne. (most of the time)

Any.key · Jul 25, 2023

bump

wijebahu · Jul 25, 2023

Okat boru bubble ekak. Instructions dila table format ekak gaththa kiyamauko. Mulin hariyata denawa. E kiyanne uta terilane den passe prompt walata apahu mula indan wage weradi uththra denne. Pissu pakak.

Stimulus mind · Jul 25, 2023

TFS brother

Is the AI behind ChatGPT getting worse?

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

bump​

Well-known member

Well-known member

Similar threads

bump