News from OpenAI: the company that develops ChatGPT and Sora has officially announced the launch of two new and very powerful AI models in the near future, GPT-o3 And GPT-o3-minirespectively the successors to the o1 and o1-mini models released earlier this year. The idea of OpenAI is launch o3-mini by the end of January and later also make the o3 model available. According to what was declared by the company managed by Sam Altmanthe o3 model, at least in certain conditions, comes close to the concept of AGI (Artificial General Intelligence): with this acronym we refer to a sort of “artificial super intelligence” capable of having very high reasoning capabilities, far superior to “traditional” AI models.
The models were trained using thereinforcement learning reinforcement learningwhich allows the algorithm to “think” before providing the most accurate answer. At the moment neither o3 nor o3-mini are yet available on a large scale, as they are reserved for security researchers, who can register to preview their security.
What the new AI o3 and o3-mini models can do: features and functionality
Compared to the previous model, o3 introduces a better fact-checking abilityan adjustment of reasoning time and a clear improvement in areas of use relating to fields such as mathematicsthe sciencethe physics and the writing code. However, this computational power requires a high cost in terms of resources which causes some latency, rendering the model more suitable for complex tasks rather than general uses.
ChatGPT o3 represents an evolution in the artificial intelligence landscape for its reasoning-oriented approach. The model is in fact capable of “think” before answeringdeveloping a chain of thought that analyzes a problem from different perspectives. This process not only improves the reliability of the outputs, but allows the model to tackle complex problems with greater precision compared to “traditional” models. Suffice it to say, for example, that no model exceeds 2% of mathematical problems with the EpochAI’s Frontier Math testwhile o3 was able to achieve the 25.2%.
Furthermore, unlike its predecessors, o3 offers the possibility of “regulating” the time dedicated to reasoning: a longer time leads to better performance, although latency increases. This feature gives o3 more versatility and significantly distinguishes it from GPT-o1, which did not have this type of control.
There o3-mini versionon the other hand, represents a more compact and specific variant for targeted tasks. Although less powerful than o3, it is designed to provide a balance between efficiency and performancewhich makes it suitable for use in contexts where computational resources are limited. This distinction between the two models highlights OpenAI’s strategy to diversify the applications of artificial intelligence, making it accessible to different types of users and operational scenarios.
Another key element of the new o3 models concerns the ability to verify factsthus reducing the risk of the so-called “hallucinations” of artificial intelligence, i.e. apparently coherent but in fact incorrect responses. This verification, which does not completely eliminate the risk of hallucinations (let’s be clear), has a cost: the model takes longer to provide an answer compared to models that do not integrate reasoning. Despite this, initial tests show that o3 performs extraordinarily in mathematical and scientific benchmarksfar surpassing its predecessor and setting new standards in the industry.
Because the new GPT models bring us closer to AGI
OpenAI’s goal is clearly to get closer and closer toAGIa technology capable of carrying out any human task with a level of autonomy and competence comparable, or superior, to ours. The model o3 has already achieved notable scores on ARC-AGIa test designed to evaluate an AI system’s ability to efficiently learn new skills outside of the data it was trained on. Well, o3 got a score of 87.5% in the high compute setting, and in the worst-case scenario, the model tripled the performance of o1 (as highlighted in the following graph). Really not bad!
However, as pointed out by François Cholletcreator of the ARC-AGI benchmark, o3 struggles on simple tasks for humans, suggesting that we are still relatively far from AGI. Chollet, in this case, declared:
Early data suggests that the next benchmark will still represent a significant challenge for o3, potentially reducing its score to less than 30% even with high computation (while an intelligent human would still be able to capable of scoring above 95% without training) (…) We will know that AGI is reality when the possibility of creating tasks that are easy for normal humans but difficult for AI becomes simply impossible.
What happened to GPT-o2?
We conclude the in-depth analysis with a question that the most attentive of you will have already asked yourself. Cwhat happened to GPT-o2? Why did OpenAI “skip” this model by switching from GPT-o1 to GPT-o3? According to what was stated by The InformationOpenAI has discarded the use of the acronym “o2” for avoid possible legal problems with the British telecoms provider O2.