by Stephen Lawn, Solution Architect
There’s a common tendency to view artificial intelligence (AI) as a single, unified concept. However, much like engineering that includes multiple disciplines such as civil, mechanical, and software engineering, AI is a broad field made up of multiple distinct types. Two of the most prominent are:
- Agentic AI: Operates with minimal human intervention and is best suited for workflow automation and process management.
- Generative AI: Typically requires human input, called a ‘prompting’, to initiate and guide it to create new content including text, code or images.
Most people are familiar with Generative AI assistants such as ChatGPT. For those organizations that utilize this powerful tool, the question becomes whether the information users input into generative AI tools is used to further train those models.
The 3 Versions of ChatGPT
There are a number of popular generative AI assistants out there including Gemini, Claude and Perplexity. We will focus on ChatGPT, which is managed by OpenAI, as it is the most well-known. If you want to understand how a generative AI assistant like ChatGPT works, just ask it:
ChatGPT is a large language model AI that generates text responses based on patterns it has learned from vast amounts of data. When you type a question or request, the AI analyzes your words and predicts the most likely and relevant response, one word at a time. It does not actually “understand” like a human but uses probability and context to create answers that make sense. The AI does not remember individual conversations and does not store personal data from your session unless explicitly configured to do so.1
Just like there isn’t just “one AI” there isn’t one version of ChatGPT. The three versions, or tiers of ChatGPT1 as of right now are:
- ChatGPT Free (GPT-3.5): Available to anyone with an OpenAI account. It provides access to ChatGPT powered by GPT-3.5, a less advanced language model that can be slower and less accurate than more advanced models. It excludes some of the features of the two paid versions and is best for students or casual users wanting to test out AI.
- ChatGPT Plus (GPT-4): Requires a monthly subscription but unlocks access to more advanced models (GPT-4 and GPT-4 Turbo) that provides improved performance, faster response times and the ability to handle more complex tasks compared to the free GPT-3.5 version. Best suited for small businesses or individual professionals.
Plus models that are available.2
- GPT-4o:The default ChatGPT model that OpenAI continuously updates. This model can reason across audio, vision, and text in real time.
- GPT-4:A highly capable model with improved accuracy, safety, and functionality compared to GPT-3.5.
- Reasoning Models:Access to powerful reasoning models like o1 and o3, which are great for in-depth research, problem-solving, and complex tasks in domains like math, science, and coding.
- GPT-4.1:A specialized model that excels at coding tasks and instruction following.
- GPT-4.1 mini:A smaller, faster, and cheaper version of GPT-4.1, suitable for basic coding and quick help with instructions.
- o3-mini & o3-mini-high:Efficient models for everyday tasks and handling complex tasks in science, math, and coding.
- o4-mini & o4-mini-high:Reasoning models, with o4-mini-high being useful for complex problems, particularly in coding.
- GPT-4.5:The latest and smartest model, with better accuracy, smarter responses, and fewer mistakes.
- Enterprise Version: Like the Plus version it uses GPT-4 Turbo. This licensed version is designed for large organizations or companies and includes many enhanced security and management features and can be customized for specific use cases.
How to Protect Your Data in ChatGPT
So where does ChatGPT get all its training data from? Let’s ask ChatGPT3:
OpenAI uses data from different places including public sources, licensed third-party data, and information created by human reviewers. We also use data from versions of ChatGPT and DALL·E for individuals. By default, business data from ChatGPT Team, ChatGPT Enterprise, ChatGPT Edu, and the API Platform (after March 1, 2023) isn’t used for training our models, unless you have explicitly opted in to share your data with us to improve the services.
If you take only one thing away from this article, let it be this: The data a user inputs in either the free or paid Plus versions can be used for training. That is because there is a setting called “Improve the Model for Everyone” that is enabled by default. You may ask yourself, what does “Improve the Model for Everyone” mean? This is the basis for the concern of most people about the AI “Learning” from all our knowledge and data. If you use either of these two versions and want you to opt out, go to your ChatGPT settings right now. Navigate to Settings > Data Controls and see if the setting is enabled as shown in the screenshot below:
If enabled, click the setting and use the toggle button to disable it as shown below:
The Enterprise version disables this setting by default, so users do not need to adjust the setting at all. In addition, any prompted data is isolated by storing it in a separate data environment. This reduces the risk of data leakage, accidental exposure, or unauthorized access.4
What about API Access?
There is another side to AI usage that needs to be discussed and that is API access. This includes all the applications being created by AI and the AI agents that are being deployed for automation and content creation and even response to various support questions. It is important to note that all API interactions between created or bought SaaS applications do NOT learn from input data, nor does it store that data that is uploaded for validation or review. ChatGPT API calls may store API data temporarily for abuse monitoring and debugging for up to 30 days (unless you are on a Zero Data Retention plan). This data is isolated and only accessible by the API key used to create the request
ChatGPT Data Summary
Below is a summary of how ChatGPT handles your data:
- Data shared with ChatGPT is not mixed with other customers’ data.
- With ChatGPT Enterprise, Team, and API your data is never used to train OpenAI’s models
- Each session is isolated, and information from your organization is not accessible to other users. The exception here is the use of “Memory” if turned on for the specific user will persist for future queries or projects.
- OpenAI uses robust security controls, including encryption and strict access controls, and maintains compliance with major industry standards.
- Users of the Free or Plus version need to manually opt out to prevent their data from being used for AI training.
If you want to learn about the inherent security vulnerabilities of generative AI or want to learn new strategies to protect your prompted data, contact HALOCK Security Labs today.
SOURCES:
- ChatGPT Product Page (OpenAI) https://openai.com/chatgpt
- https://platform.openai.com/docs/models
- How OpenAI Uses Data https://help.openai.com/en/articles/8398479-how-openai-uses-data
- ChatGPT Enterprise and Business Privacy https://help.openai.com/en/articles/8825690-chatgpt-enterprise-and-business-privacy
MORE INSIGHTS ON ARTIFICAL INTELLIGENCE (AI)
Managing AI Risks in Organizational Adoption and Usage
A Primer for AI Legislation and Litigation: Trends and Resources