OpenAI announced the latest version of its primary large language model, GPT-4, on Tuesday, that it says exhibits “human-level performance” on many professional tests.
GPT-4 performed at the 90th percentile on a simulated bar exam, the 93rd percentile on an SAT reading exam, and the 89th percentile on the SAT Math exam, OpenAI claimed.
Sam Altman, CEO of OpenAI, walks from lunch during the Allen & Company Sun Valley Conference on July 6, 2022, in Sun Valley, Idaho.
Kevin Dietsch | Getty Images News | Getty Images
OpenAI announced the latest version of its primary large language model, GPT-4, on Tuesday, that it says exhibits “human-level performance” on many professional tests.
ChatGPT-4 is “larger” than previous versions, which means it has been trained on more data and has more weights in its model file, making it more expensive to run as well.
Currently, many researchers in the field believe many of the recent advancements in AI come from running ever-larger models on thousands of supercomputers in training processes that can cost tens of millions of dollars. GPT-4 is an example of an approach centering around “scaling up” to achieve better results.
OpenAI said it used Microsoft Azure to train the model; Microsoft has invested billions in the startup. OpenAI did not publish details about the specific model size or the hardware it used to train it, which could be used to recreate the model, citing “the competitive landscape.”
OpenAI’s GPT large language model powers many of the artificial intelligence demos that have been wowing people in the technology industry in the past six months, including Bing’s AI chat and ChatGPT, and the latest version is a preview of new advancements that could start filtering down to consumer products like chatbots in the coming weeks. Bing’s AI chatbot uses GPT-4, Microsoft said on Tuesday.
OpenAI says the new model will produce fewer factually incorrect answers, go off the rails and chat about forbidden topics less often, and even perform better than humans on many standardized tests.
GPT-4 performed at the 90th percentile on a simulated bar exam, the 93rd percentile on an SAT reading exam, and the 89th percentile on the SAT Math exam, OpenAI claimed.
However, OpenAI warns that the new software isn’t perfect yet and that it is less capable than humans in many scenarios. It still has a major problem with “hallucination,” or making stuff up, and isn’t factually reliable, the company said. It is still prone to insisting it is correct when it is wrong.
“GPT-4 still has many known limitations that we are working to address, such as social biases, hallucinations, and adversarial prompts,” the company said in a blog post.
“In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5,” OpenAI wrote in a blog post.
The new model will be available to paid ChatGPT subscribers and will also be available as part of an API which allows programmers to integrate the AI into their apps. OpenAI will charge about 3 cents for about 750 words of prompts and 6 cents for about 750 words in response.