OpenAI releases GPT-4o mini: goodbye to GPT 3.5

5 min readJul 20, 2024

On July 18, 2024, OpenAI released GPT-4o mini, its most affordable AI model. It supports 128,000 input tokens with a context window that accepts both images and text, and will eventually have video and audio capabilities. It also allows for the generation of 16,000 output tokens per request. Although this may seem small, it is important to remember that only recently Claude 3.5 Sonnet was upgraded from 4,096 to 8,192 tokens on the Anthropic API, while most remain at 4,000 output tokens.

Advantages of increasing output tokens with GPT-4o mini

This makes GPT-4o mini suitable for translation and transformation tasks where the expected output closely matches the input size, thus avoiding having to split the output into multiple parts. Indeed, even if not common for everyone, the problem of response length could be a significant obstacle in various contexts. For example, Gemini 1.5 Pro presents a notable asymmetry: it can process a million input tokens, but is limited to generating only 8192 output tokens. This imbalance makes it impractical to translate an entire application in a single prompt. The solution remains to fragment the work: initially providing a complete list of necessary translations and then requesting the translation of texts one by one. With GPT-4o mini, this problem should be much less felt.

Furthermore, like its bigger brother GPT-4o, GPT-4o mini offers better multi-lingual support compared to GPT-3.5 Turbo and therefore more efficient tokenization for languages other than English.

Competitive pricing

But the most interesting part is the prices. The new model is significantly cheaper: it has dropped to $0.15 per million input tokens and $0.60 per million output tokens, over 60% lower than GPT-3.5 Turbo and 97% lower than GPT-4o! Even compared to equivalent-sized models, it proves to be much cheaper than Claude 3 Haiku ($0.25 / 1 million input tokens and $1.25 / 1 million output tokens) and Gemini 1.5 Flash ($0.35 / 1 million input tokens and $1.05 / 1 million output tokens). To understand how the market is evolving, OpenAI emphasizes that “the cost per token of GPT-4o mini has decreased by 99% from text-davinci-003, a less capable model introduced in 2022”, envisioning a future where models are seamlessly integrated into every app and website, becoming more accessible, reliable, and incorporated into our daily digital experiences. In other words: “An LLM for every app and every website”.

High performance

But GPT-4o mini also offers higher performance. According to some benchmarks, GPT-4o mini is cheaper and much more capable than GPT-3.5 Turbo, and although inferior to GPT-4o, it is better (with an MMLU score of 82.0%) than other small models such as Gemini Flash (77.9%) and Claude Haiku (73.8%). Of course, competitors have not stood still: Anthropic has already promised to release 3.5 Haiku, and information has leaked about the upcoming release of Gemini 2.0.

Enhanced for Security

In addition to being designed to be more economical and faster, the GPT-4o mini is a potentially more secure language model compared to its predecessors. One of its main features is its ability to resist rapid injection attacks and jailbreaks, which are common in large language models (LLMs). These attacks involve replacing the model’s original instructions with malicious prompts and can be executed easily, making them particularly dangerous.

To counter these attacks, OpenAI introduced the instruction hierarchy method in April 2024. This method assigns different priorities to instructions based on their origin: developer instructions have the highest priority, followed by user instructions, and finally third-party tool instructions. The model is therefore trained to follow the highest-priority instructions when conflicts arise, ignoring lower-priority ones.

The GPT-4o mini is the first OpenAI model trained from scratch to follow this instruction hierarchy, making the model’s responses more reliable and improving its security for large-scale applications. However, it is important to note that increased security does not imply the total absence of vulnerabilities. In fact, the first mini jailbreaks of the GPT-4o have already been reported.

Accessibility and availability

Finally, in ChatGPT, Free, Plus, and Team users will be able to access it as a replacement for GPT-3.5, which has disappeared from the selection options, although it is still present in the APIs. Active conversations with that model are not interrupted (only generically labeled as ChatGPT), but now, given the costs, it would no longer make much sense to use it.

It should be noted that it was only thanks to GPT-3.5 that OpenAI gained a significant advantage in the cloud AI market thanks to the remarkable capabilities of its chatbot, ChatGPT, which debuted at the end of 2022, starting the AI gold rush. Before that, there was GPT-3, a language model mainly conceived for sentence completion rather than conversation: although GPT-3 was an interesting innovation, its practical applications were limited. It was only with the introduction of GPT-3.5, however, that a real revolution in the field of conversational artificial intelligence took place.

Response to competition

Obviously, this move by OpenAI was to be expected because with competing models, including many free ones, flooding the market, it could not sit still and announced a cheaper way to use its artificial intelligence, allowing more companies and programs to tap into its AI. Although OpenAI explains that this model is part of an effort to make AI “as widely accessible as possible”, this move clearly also reflects the growing competition among cloud AI providers and the increasing interest in small, free open-source AI models.

ChatGPT Token Cost Analysis update

Finally, I would like to point out that I have also updated my tool to estimate OpenAI API costs, ChatGPT Token Cost Analysis (now at version 1.0.7), starting from the JSON backup of your ChatGPT conversations. You can try it online here, but if you want more privacy, you can clone my GitHub repository and run it entirely locally and offline. With this price cut, making API calls with GPT-4o mini is really convenient compared to a $20 per month ChatGPT Plus subscription.

English translation of an italian post that was originally published on Levysoft.it