Analysis of OpenAI API Costs: GPT-4 vs ChatGPT

Antonio Troise
15 min readJul 4, 2024

--

English translation of an italian post that was originally published on Levysoft.it

In the era of artificial intelligence, tools like ChatGPT by OpenAI have revolutionized the way we interact with technology. It’s not uncommon, therefore, for those who use it daily to subscribe to the Plus version of this chatbot for $20/month, which offers some undeniable advantages for Premium users such as:

  • priority access without waiting even during peak hours;
  • faster response times;
  • priority access to new features and new models like GPT-4o;
  • a context window of 32,000 tokens (instead of the 8,000 tokens of the free version);
  • the creation of GPTs (user-configurable agents for performing specific tasks);
  • the ability to browse the Internet;
  • advanced data analysis;
  • access to DALL-E to create images from their textual description.

However, one of the most important considerations for those using these technologies is the cost. With this article, I want to explore this aspect and explain how to calculate and compare the costs of using OpenAI’s APIs versus the ChatGPT subscription, using ChatGPT’s chat archive to accurately estimate the total and monthly cost. In practice, I will explain how to export all ChatGPT chats and, assuming they were all done via API, calculate the total and monthly cost to determine whether I would spend more or less compared to the $20 monthly subscription.

Introduction to tokens and token costs

Before delving into cost analysis, it is essential to understand the concept of “token”. In the context of language models like GPT, tokens are the fundamental units that determine usage cost. A token can represent a word, part of a word, or even a single character, depending on the complexity of the text. From an analysis by Simon Willison, it is clear that often the most common English words are assigned a single token, and therefore languages other than English suffer from less efficient tokenization. In general, however, English and many Western languages using the Latin alphabet typically tokenize around words and punctuation, while logographic systems like Chinese or Japanese often treat each single character as a distinct token, leading to a higher number of tokens. Less efficient tokenization for languages other than English is due to the fact that language models are primarily trained on large amounts of English text, thus optimizing tokenization for this language and neglecting the specificity of other languages.

It is also important to note that all large language models (LLMs) have a token limit. This means there is a maximum number of tokens the model can process in a single request, which can affect the model’s ability to handle long or complex texts. Less efficient tokenization, therefore, implies that texts in languages other than English can reach this limit more quickly, reducing the amount of content that can be processed in a single request.

To get an idea of how tokens work, you can visit gpt-tokenizer.dev, a site that offers a visual tokenizer optimized for OpenAI’s GPT models, including GPT-4, and therefore with full support for all current OpenAI models (available encodings: r50k_base, p50k_base, p50k_edit, and cl100k_base).

OpenAI also offers a Tokenizer tool on its website to explore how tokens work.

OpenAI, the company behind ChatGPT, bases its API pricing model on the number of tokens used. Costs are divided between input tokens (the text the user provides to the model) and output tokens (the response generated by the model) and vary depending on the specific model used.

The following table, which I created for this article, shows the updated costs for various models as of July 18, 2024. You can check the prices for the models directly on the sites of OpenAI,Claude and Gemini.

Prices for text LLM API models updated to July 18, 2024

A chart showing the cost composition (input vs output) for each model might better summarize the cost per Token of the main LLMs:

I have labeled the costs for the Claude and Gemini models as “Experimental” because in my project I deduced that they work similarly to GPT models in terms of token calculation (and thus I used the same libraries for token counting). However, I reserve the right to further verify these values because, as you will see, I made a mix: I took an export of OpenAI chats and calculated the cost with the token prices of Claude and Gemini. The most accurate approach would have been to take the respective exports and calculate them with their API costs (since a model might cost less in output tokens because it is less verbose), but, aside from the fact that, at the moment, there doesn’t seem to be an easy way to export all chats for Anthropic and Google LLMs, my goal was merely to satisfy my curiosity and see if it was more financially sustainable to support a chat session by paying for what you use or by subscribing to a fixed-cost subscription.

ChatGPT Token Cost Analysis

To help users understand and manage these costs, I developed ChatGPT Token Cost Analysis, an open-source project available on Github (but you can also try it online on my website here) that includes both a Python script and an HTML+JavaScript web application to analyze token costs of ChatGPT chats exported in total security and privacy.

How to export ChatGPT chat history

The first step to calculate the costs is to extract the ChatGPT chats by exporting all your data, and then proceed with automatic analysis. Here’s how you can do it:

1. Access ChatGPT.
2. In the upper right corner of the page, click on your profile icon.
3. Click on Settings.

4. Go to the Data Controls menu.
5. Under Export Data, click Export.

6. On the confirmation screen, click Confirm export.

7. You should receive an email with your data. Note that the link in the email expires after 24 hours. Click on Download data export to download a .zip file.

This file includes the history of your chats in chat.html and other data associated with your account.

In the .zip file, you will find both chat.html and conversations.json. The conversations.json file is the one required for processing with our tools because, since the JSON format is structured, with each conversation represented as an object with keys and values, it makes it easy to read and manipulate the data with Python or JavaScript.

The data export feature is available in both the Free and Plus plans but obviously is not available for users who are not logged in.

The libraries used for token calculation

Tiktoken library for Python

Using Python, we can import the chat archive in JSON format and calculate the number of tokens used. To do this, I used a Python library called tiktoken, which is developed by OpenAI specifically to handle token counting in GPT language models. Being designed to be compatible with OpenAI's language models, such as GPT-3 and GPT-4, makes it an ideal choice for token counting for these models because it ensures consistent tokenization with that used during model training.

On the tiktoken Github page it says that “tiktoken is a fast BPE tokenizer for use with OpenAI’s models”, meaning that “tiktoken is a fast BPE-based tokenizer, designed to be used with OpenAI’s models”. A tokenizer is a tool that breaks down a text into smaller units called tokens. Tokens, as mentioned, can be words, characters, or subwords, depending on the tokenization method used. The method used by OpenAI is called BPE (Byte Pair Encoding) and is a data compression algorithm used for tokenization in language models. It works by finding pairs of bytes that frequently appear together and replacing them with a single new byte . This process is repeated iteratively until a predetermined vocabulary is reached. BPE is particularly useful because it balances the granularity between single tokens and whole words, improving the model’s efficiency in dealing with common and rare words.

Gpt-tokenizer library for Javascript

For those who prefer a more user-friendly interface or are not familiar with Python, there is also a standalone HTML web application that obviously does not use the tiktoken library, as it is only available in Python (in theory, there is also a JavaScript counterpart with the library js-tiktoken but, requiring NodeJS, it was personally less immediate), but instead uses the JavaScript library gpt-tokenizer for token calculation, which uses the modern encoder cl100k_base, the same encoder used by GPT-4 and GPT-3.5 models.

The result is very accurate because I obtained the same result between the two libraries, with a total cost per chat, not considering various decimal approximations, differing only by 0.157%. Ultimately, both libraries are valid because they are designed to replicate as faithfully as possible the tokenization process used by OpenAI and, therefore, ensure, with calculations in hand, that costs are as accurate as possible.

Python Script

The first version of the project was developed only in Python and requires the installation of 2 libraries: tiktoken (for tokenizing texts) and pandas (for manipulating and analyzing structured data in DataFrames).

So, I created the script chatgpt_token_cost_analysis.py that loads the JSON chat file (conversations.json) which is analyzed to extract the messages and then, thanks to the tiktoken library, counts the number of tokens in each message. Of course, to make the estimate more accurate, costs are calculated separately for input tokens (user) and output tokens (assistant), using the specific prices of the latest, most performant, and economical OpenAI model, GPT-4o, extracted from the official OpenAI Pricing page. Then, the costs are grouped by month to get a monthly view of hypothetical expenses, and the results are saved, for further analysis, in a CSV file costs_per_month.csv.

This is an example of the output:

Terminal Output of the Python Script chatgpt_token_cost_analysis.py

This approach allows you to analyze and compare the costs of using OpenAI’s APIs for GPT-4o with the ChatGPT subscription. With the data extracted from the chat archive, you can calculate the exact costs and make informed decisions about which option is more convenient, also offering peace of mind to those who have always feared spending too much using OpenAI’s APIs, showing that costs can be managed and easily verified.

HTML web application ensuring privacy

The second part of the project, on the other hand, was more challenging because I wanted to create a standalone web application for those who prefer a more user-friendly interface or are not familiar with Python. In doing so, I also added multi-model support, that is, the ability to choose between multiple OpenAI models and, experimentally, also between Claude and Gemini models. But the real challenge was also to make this HTML page Privacy by Design, meaning ensuring user data privacy and working entirely locally, even offline, thus further ensuring that sensitive chat data is never uploaded to remote servers.

To ensure the privacy of entered chat data and provide the same functionality even when offline, the HTML page includes a mechanism to load the GPTTokenizer_cl100k_base encoder from a local source if the remote JavaScript file is not accessible. By default, the page attempts to load the tokenizer from the remote URL https://unpkg.com/gpt-tokenizer. If this remote file is not available, the script will attempt to load a local version of the tokenizer (cl100k_base.js).

Here’s a brief overview of how the mechanism works:

  • The function checkGPTTokenizer first attempts to load the GPTTokenizer_cl100k_base encoder from the remote source.
  • If the remote file is not accessible, it attempts to load the local cl100k_base.js file.
  • If neither the remote nor local files are available, the script falls back to an approximate estimate based on the number of words in the text. This fallback method, although not as accurate, provides a useful estimate of token usage.

To use the local version of the tokenizer, make sure to also download the cl100k_base.js library and place it in the same directory as the HTML file (to do this, you can easily download the latest release from my Github repository). This approach makes the tool more privacy-compliant and ensures that your data remains private by ensuring that users can analyze their chat data even without an Internet connection.

A legitimate observation that might be raised is that I could have made the HTML file a true single-page application for offline use by directly including the cl100k_base.js JavaScript file in the HTML. However, since this file is quite large (over 2 MB of data), it would have made the HTML file harder to read and analyze.

Try the online version

You can try the online version of the HTML page here: https://www.levysoft.it/chatgpt-costs.

However, you can always download the HTML page and use it locally and offline. To do this and ensure you also download the cl100k_base.js JavaScript library, I recommend downloading the latest release from my GitHub repository or clone the repository with the command:

git clone https://github.com/levysoft/chatgpt-token-cost-analysis.git

The page will work in total privacy without requiring an Internet connection.

Cost analysis results

Regardless of the tool chosen, the analysis process will give you the ability to compare the costs of using OpenAI’s APIs with those of the ChatGPT Plus subscription. Indeed, both tools will provide us with a tabular form of the total and monthly costs of the chats if they had been done via API, allowing us to easily compare them with the fixed monthly subscription cost of $20.

Here are the results for my textual chats with ChatGPT. As you can see, if I adopted the API query method, I would save at least 14 dollars a month.

Comparison table between APIs and fixed subscription

Here too, a graph can help us and show the proportion between input and output costs for each month:

Obviously, in this analysis, I only considered textual chats, because in the export, at least for now, images generated with DALL·E or the generation of images/audio are not reported, making it difficult for me to calculate their cost via API. However, if you take a look at the following tables (also from OpenAI’s official data), you can try to estimate it yourself with the prices for image API models like DALL·E:

Prices for image API models updated to July 18, 2024

and the prices for audio API models:

Pricing for Audio Model APIs Updated as of July 18, 2024

Advantages of ChatGPT Plus subscription

  1. Fixed Costs: With the ChatGPT Plus subscription, costs are fixed and predictable, regardless of the number of tokens used.
  2. Ease of Use: You don’t need to worry about monitoring token usage or managing variable costs by optimizing prompts (because as we’ve seen, using efficient prompts can significantly reduce the number of tokens used).
  3. Simplicity of Implementation: You don’t need to integrate and manage APIs, which can simplify the use of the service for those without advanced technical skills.

Advantages of using OpenAI APIs

  1. Flexibility: You pay only for what you use, which can be convenient for periods of variable usage. Choosing the appropriate model can also help you save. Indeed, the most advanced model is not always necessary, and often less expensive models can meet specific needs.
  2. Scalability: APIs can be directly integrated into applications, offering greater flexibility in custom implementations.
  3. Automation: APIs can also be used to create scripts that automate various tasks on your PC with artificial intelligence. However, it is important to avoid intensive loops without proper controls, as this could result in high costs. It is advisable to set limits to prevent excessive expenses.

Jan: App for querying OpenAI APIs like a chat

If, like me, you have discovered that using OpenAI’s APIs allows you to save a lot of money compared to the ChatGPT app and that the advantages far outweigh the disadvantages, then I imagine you are wondering how to easily query OpenAI’s APIs to simulate ChatGPT without having to be a programmer. There are numerous apps on the market that do all the work for you, provided you enter your OpenAI API key (which you can find in your user profile).

But be careful because many GPT-based applications aim to profit by reselling OpenAI tokens at higher prices. However, there are some that allow you to use your own OpenAI API key, making them cheaper and often not requiring login. With these apps that support “bring your own API key”, you pay only for what you use, and many offer additional features with a one-time purchase.

Among these, the best app I can suggest, both for quality and because it is completely free, is Jan.

Jan is a cross-platform open-source project (so compatible with macOS, Windows, and Linux) in continuous evolution and improvement that, in addition to running local models like Llama or Mistral (just like LM Studio, Msty, Ava PLS,Enchanted and Backyard AI ex Faraday.dev), also allows you to enter access keys to connect to remote APIs of LLMs like OpenAI, Anthropic (for Claude), Groq, Cohere, NVIDIA and even Hugging Face.

All conversations remain saved on your computer and are exportable and can be deleted at any time. What impressed me about this app is that it is possible to easily switch from the remote GPT-4o API to local offline LLM models in a second without too many problems.

Conclusion

Cost analysis shows that, depending on usage, OpenAI’s APIs can be cheaper than the ChatGPT subscription. However, the choice between the two options depends on specific needs and usage patterns. If your usage is intensive and consistent, the ChatGPT subscription offers peace of mind with fixed costs regardless of the number of tokens used or the model queried. On the other hand, for sporadic or variable uses, OpenAI’s APIs might offer a better cost-effectiveness ratio with significant savings. What you need to keep in mind is that understanding and analyzing ChatGPT token costs is not just a matter of budget, but a fundamental step towards more conscious and efficient use of artificial intelligence.

The purpose of this project is purely indicative and is designed to satisfy personal curiosity and, therefore, should not be considered an official or definitive tool. It is important to use it with an awareness of its limitations and not rely on it for critical or professional decisions. With this approach, you can easily verify the cost of your interactions with ChatGPT using your chat archive and make an informed decision on which option is best for you.

UPDATE: On July 18, 2024, I added the GPT-4o mini model, which is more affordable, smarter, and more economical than GPT-3.5 Turbo, and it has vision capabilities. The model has a 128K context and a knowledge cutoff of October 2023. With this price cut, making API calls to GPT-4o mini is really cost-effective.

UPDATE 2: On August 6, 2024, OpenAI, responding to the growing competition in large language models, reduced the prices for its GPT-4o language model. Using the new version gpt-4o-2024–08–06, developers can save 50% on input tokens ($2.50 per million tokens) and 33% on output tokens ($10.00 per million tokens) compared to gpt-4o-2024–05–13. The new model also supports 16,384 output tokens, compared to the original GPT-4o’s 4,096, while maintaining similar performance.

UPDATE 3: On September 12, 2024, OpenAI introduced OpenAI o1, a family of models consisting of OpenAI o1-preview and OpenAI o1-mini (a smaller, more efficient model designed for code generation). This new series of artificial intelligence models is based on a “chain of thought” approach, designed to spend more time thinking before responding to complex tasks. OpenAI o1 avoids some of the pitfalls of reasoning that generative AI models typically stumble upon because it can “think” before answering questions, meaning it is able to refine its responses by dedicating more time to considering all parts of a question. The models have a context length of 128K and a knowledge cutoff of October 2023. The pricing of the new models reflects their increased complexity compared to previous ones like GPT-4o, with significantly higher costs: the input price has tripled, while the output price has quadrupled. Specifically, o1-preview costs $15/$60 per million input/output tokens, while o1-mini is more affordable at $3/$12.

--

--

Antonio Troise
Antonio Troise

Written by Antonio Troise

Blogger at levysoft.it and english edition curator on Medium in AI tech. Former founder of Gamertagmatch and Seguiprezzi. Sharing on levysoft.tumblr.com.