Lockdown Mode arrives on ChatGPT for security against hacker attacks: what it is and how it works

Lockdown Mode arrives on ChatGPT for security against hacker attacks: what it is and how it works

There Lockdown Mode by ChatGPT was created to respond to a concrete security problem that emerges when artificial intelligence systems begin to interact in a profound way with the Web and external applications. This advanced and optional protection mode designed for a small number of high-risk users which drastically reduces the possibility of data exposure by strictly limiting the AI’s interactions with the outside world. The objective is to mitigate an increasingly relevant class of attacks, that of the so-called prompt injectionin which a malicious actor attempts to manipulate the model’s behavior by causing it to follow unexpected instructions or leak sensitive information from attacked systems.

Alongside the Lockdown Mode, new labels also arrive “High risk” which indicate, directly in the interface, the functions that entail a more delicate security profile. Let’s analyze it in a little more detail why these tools were introduced, how they work and how they help users and businesses make more informed decisions about using network-connected AI.

How Lockdown Mode works on ChatGPT

When an AI system communicates only with the user, the risk perimeter is relatively controllable. Things change, and a lot, when the model can navigate online, query external services or act via connected apps. In this context the phenomenon of prompt injection becomes potentially concrete. With this technique, attackers can exploit external content (for example, a web page or seemingly innocuous input) to “inject” instructions designed to trick the model into performing unwanted actions, as if the user had legitimately given them. This entails multiple risks, including potential data exfiltration, i.e. the unauthorized leakage of information to third parties (cybercriminals).

There Lockdown Mode responds to this scenario with a deterministic approach, i.e. based on rigid and predictable rules. By activating it, some features of ChatGPT are limited or disabled if strong control over data flows cannot be guaranteed. A key example is the Web browsing: in Lockdown mode, access to the Internet is limited to content present in the cache, avoiding real-time network requests towards the Web. In simple terms, this means that it is much more complex for attackers to be able to get data and information out and, where this level of guarantee is not achievable, the ChatGPT Web browsing function is simply turned off.

Image
Diagram of how ChatGPT Lockdown Mode works. Credit: OpenAI.

This setting it is not designed for the average userbut for particularly exposed profiles such as executives, security managers or teams managing critical data. It is no coincidence that the Lockdown Mode is “grafted” on the protections already present in the Enterprise plans, which include techniques such as sandboxing (the isolation of execution in controlled environments), controls against URL-based exfiltration and monitoring and audit systems.

It is initially available to users who have ChatGPT Enterprise, ChatGPT Edu, ChatGPT for Healthcare And ChatGPT for Teachersand can be managed by administrators by creating specific roles and applying additional restrictions beyond standard policies. Regarding the extension of the new security mode to all other users, OpenAI said:

We plan to make Lockdown Mode available to consumer users in the coming months.

One notable aspect is the granular control over connected apps. Because many business workflows depend on external integrations, administrators can decide precisely which applications and actions are allowed when Lockdown Mode is enabled.

It also gets the “High Risk” label for safety

Alongside the Lockdown Mode, a second important piece arrives: the systematic introduction of“High Risk” label for some features that, while useful, expand the attack surface. Functions that require access to the network or external resources are reported consistently across ChatGPT, ChatGPT Atlas and Codex, accompanied by clear explanations of what changes and what risks are introduced.

The objective is to immediately clarify when a specific function implies a greater risk for security: as reported by OpenAI, however, these labels are not definitive and may be removed over time.