OpenAI has rolled out GPT‑5.4, its latest flagship AI model, introducing two new variants GPT‑5.4 Pro and GPT‑5.4 Thinking that are designed to push professional and enterprise use of AI to a new level. The company is positioning the release as a major upgrade in reasoning, coding, and computer-use capabilities, consolidating advances that were previously scattered across multiple specialized systems.
GPT‑5.4 is now being made available across ChatGPT’s paid tiers as well as through the API, signaling a clear focus on business and high‑end productivity users. With support for context windows of up to one million tokens, the model can process entire codebases, large contract bundles, or multi‑month project threads in a single session, reducing the need to repeatedly re‑upload or re‑explain information.
OpenAI’s latest model family is built around two distinct but complementary experiences. GPT‑5.4 Thinking is aimed at complex reasoning tasks and introduces a more transparent, “plan‑first” style of answering that shows users how the model intends to tackle a problem before executing. This lets people interrupt, correct, or refine the approach mid‑way, cutting down on the usual back‑and‑forth that comes with refining prompts.
GPT‑5.4 Pro, on the other hand, is optimized for high‑stakes, high‑volume work where speed and consistency matter. It leans heavily on OpenAI’s latest advances in agentic behavior and computer-use, giving the model the ability to navigate software interfaces, work with complex spreadsheets, and assemble multi‑step deliverables such as pitch decks or detailed analytical reports.
OpenAI describes GPT‑5.4 as the first time that its top‑tier reasoning, coding, and tool‑use capabilities have been brought together in one general‑purpose model rather than split across separate offerings. Company engineers say that under the hood, the system has been tuned to use fewer tokens to reach an answer, cutting the computational cost even as headline per‑token pricing has increased.
One of the most visible changes for everyday ChatGPT users will come from GPT‑5.4 Thinking. Instead of jumping straight into a long answer, the model now surfaces its intended plan for complex queries laying out the main steps it will take and, in many cases, the tools or sub‑tasks it will rely on.
This approach, sometimes called “visible reasoning”, serves multiple purposes. It gives users a chance to quickly correct misunderstandings before the AI runs too far in the wrong direction, and it also acts as a transparency measure at a time when regulators and industry watchdogs are pushing for more explainable AI systems.
According to OpenAI, internal evaluations show that GPT‑5.4 Thinking makes significantly fewer factual errors per response than its GPT‑5.2 predecessor. The company says the rate of incorrect statements within individual claims has fallen, and full answers are less likely to contain subtle hallucinations that creep into multi‑step reasoning.
GPT‑5.4 Pro is explicitly aimed at users who rely on AI as a core part of their daily workflow from analysts and consultants to developers and operations teams. The model offers higher performance on demanding benchmarks, particularly those involving long‑horizon tasks like financial modeling, end‑to‑end slide creation, and document‑heavy legal analysis.
Brendan Foody, CEO of Mercor, a company that runs extensive benchmarks on AI agents for law and finance, said GPT‑5.4 Pro stood out clearly in his firm’s tests. He noted that the model “excels at creating long‑horizon deliverables such as slide decks, financial models, and legal analysis, delivering top performance while running faster and at a lower cost than competitive frontier models.”
Under real‑world test suites for computer use where the AI must navigate desktop interfaces and websites GPT‑5.4 has set new records. On OS‑level benchmarks, it has been shown to outperform even human participants on certain structured tasks, with the model more reliably completing multistep workflows without missing instructions.
While the raw per‑token price of GPT‑5.4 and GPT‑5.4 Pro is higher than that of GPT‑5.2, OpenAI argues that the new model family is more economical overall because it uses far fewer tokens to reach the same or better result. In internal tests, the company reports that GPT‑5.4 often required dramatically shorter conversations and smaller prompts to complete complex jobs.
A major contributor is a new “tool search” mechanism for developers using the API. Instead of having to feed the model a full list of tools and JSON schemas on every call, applications can now let GPT‑5.4 dynamically request the tool definitions it needs. According to OpenAI’s measurements, this can cut tool‑related token usage nearly in half for larger applications that rely on many different capabilities.
This design change is particularly significant for enterprise deployments, where tool lists can grow large and where small per‑request savings add up quickly across millions of calls. It also reduces latency for end users, since the model has less information to process before it begins working.
Across a wide range of public and proprietary benchmarks, GPT‑5.4 appears to deliver measurable improvements over earlier OpenAI models. On task suites that simulate the work of professionals in sectors contributing to U.S. GDP, the model has been shown to match or exceed human expert performance in a majority of cases, outpacing GPT‑5.2 by a sizable margin.
In tests focused on spreadsheet‑based financial analysis mirroring the day‑to‑day work of junior investment bankers, GPT‑5.4 is reported to reach significantly higher accuracy than the previous generation. Its performance on code repair and software‑engineering benchmarks also ticks upwards, though the gains in coding are more modest than the jump seen in agentic computer‑use tasks.
The Pro variant tends to extend this lead on the most demanding evaluations. It is particularly strong on benchmarks that require combining reasoning, planning, and precise manipulation of user interfaces, underscoring OpenAI’s effort to build a model that can act as a reliable digital assistant for white‑collar work.
OpenAI is also emphasizing the safety work that went into GPT‑5.4. The company says the model has been stress‑tested for susceptibility to jailbreak prompts and misuse, and that its refusal and red‑teaming systems have been updated to take into account the richer internal reasoning surfaced by the Thinking variant.
Early evaluations suggest GPT‑5.4 is harder to trick into producing disallowed content than GPT‑5.1 and GPT‑5.2, although OpenAI acknowledges that determined attackers will continue to probe for weaknesses. The firm has also classified the model’s cybersecurity capabilities at a high level, meaning it could, in principle, be used to automate parts of cyber operations, which is why protective measures and monitoring are being tightened in parallel.
Another major focus is hallucination reduction. OpenAI’s internal testing indicates that GPT‑5.4 generates incorrect factual claims less frequently, especially when handling long‑form answers that chain together many steps of reasoning. By revealing more of that reasoning through GPT‑5.4 Thinking, the company hopes users can spot and correct any lingering errors more easily.
GPT‑5.4 is being rolled out first to paying ChatGPT customers, with Plus, Team, Pro, and Enterprise users among the earliest to get access. For many of these customers, GPT‑5.4 Thinking is replacing GPT‑5.2 Thinking as the default option for complex reasoning tasks, although the older model will remain accessible for a limited period.
On the developer side, GPT‑5.4 is available via the API with separate pricing tiers for the standard and Pro models. While exact rates vary by region and usage levels, the direction is clear: GPT‑5.4 Pro is priced at a premium for organizations that need the best available performance, while the regular GPT‑5.4 model aims to balance capability and cost for a broader set of applications.
OpenAI argues that despite the headline increase in per‑token fees, most customers will end up paying less for the same work because GPT‑5.4 can accomplish in a few messages what previously required much longer conversations. That argument is likely to be tested as enterprises and start‑ups begin migrating large, production‑scale systems to the new model family.
The introduction of GPT‑5.4 comes at a time of intense competition in the AI sector, as rivals race to offer models with stronger reasoning and more reliable agentic behavior. Anthropic, Google, and several open‑source communities have all released their own advanced systems in recent months, raising the bar for performance and safety.
Early reactions from analysts suggest that GPT‑5.4 places OpenAI back in a clearly leading position on several of the most closely watched benchmarks, particularly those that blend reasoning with direct action on a computer. For enterprise buyers, that combination of intellect plus the ability to “do the work” is seen as key to justifying large‑scale AI investments.
At the same time, the move further blurs the line between chatbots and fully fledged digital employees. As models like GPT‑5.4 begin to take on longer, more complex workflows, questions about accountability, oversight, and impact on human jobs are likely to intensify.
For individual professionals, GPT‑5.4 promises a more capable assistant that can stay with a project from start to finish helping draft strategies, analyze data, produce slides, and troubleshoot code, all within a single persistent context. The Thinking variant’s plan‑first behavior could also make it easier to collaborate with the model as if it were a junior colleague whose work you can review and redirect.
For businesses, the Pro version and the new tool‑search mechanism hint at a future where AI systems are deeply integrated into internal software and workflows, rather than sitting off to the side as separate chat interfaces. If OpenAI’s efficiency claims hold up in production, GPT‑5.4 could lower the effective cost of deploying powerful AI across large organizations, even as list prices rise.
Ultimately, GPT‑5.4 is less about a single flashy feature and more about convergence. By bringing together reasoning, coding, and computer‑use in one high‑end model line, OpenAI is signaling that the next phase of AI will be defined not just by what systems can understand or generate, but by what they can reliably do.
Comments