May 4, 2026

Why ChatGPT became “fixated” with goblins and trolls: OpenAI intervened to correct the problem

Image generated with AI for illustrative purposes only.

ChatGPT has developed an unusual obsession with goblins, gremlins, raccoons, trolls and orcsstarting to insert these references into their answers with an unusual frequency, referring to these figures in many cases quoting them out of context. The problem became clear after the launch of GPT-5.1last November. Users had reported that the model seemed strangely excessive in tone, almost too confidential. This prompted the team to examine specific linguistic patterns in the responses. A researcher asked to include the words in the analysis “goblin” And “gremlins”and according to an internal analysis described by OpenAI, the data revealed something surprising: use of the first term had increased by 175% compared to the period before the launch, while the use of the second had recorded an increase of 52%. Let’s try to understand because ChatGPT has become obsessed with goblins and trolls and, above all, how OpenAI solved the problem.

ChatGPT’s fixation on goblins: the causes

The reason why ChatGPT became fixated on goblins and other similar figures was traced back to a chatbot customization feature called “Nerdy”one of the options that allowed users to change the style and tone of responses. The system message associated with this personality invited the model to recognize the “strangeness” of the world and approach issues lightlyavoiding self-seriousness. During training via reinforcement learningor reinforcement learning, a technique in which the model is guided by “reward” or “penalty” signals based on the perceived quality of the responses, some reward signals ended up favoring responses with metaphors related to fantastic creatures. In the 76.2% of the datasets analyzed, responses containing the terms “goblin” or “gremlin” received systematically better ratings than equivalent responses without those terms.

The result? The “Nerdy” personality, which only represented the 2.5% of ChatGPT’s total responses, was responsible for the 66.7% of all mentions of “goblin”. This led to an increase in 3881.4% of the use of this term, as highlighted in the following graph.

The “Nerdy” personality is responsible for the exponential increase in the term “goblin” in responses provided by ChatGPT. Credit: OpenAI.

But the phenomenon didn’t stop there. The reinforcement learning it does not guarantee behavioral isolation: a pattern rewarded in one context can propagate to others, especially when it enters fine-tuning datasets. This is exactly what happened: the goblins multiplied far beyond the personality that gave rise to them.

How OpenAI solved the problem

To solve the problem, OpenAI retired the “Nerdy” personality in March and eliminated the reward signal responsible for the problem, while also filtering training data containing references to creatures. GPT-5.5however, had already begun his training cycle before the cause was identified. For this, in the programming environment Codexwas inserted an explicit instruction which prevents the template from mentioning goblins, gremlins, raccoons, trolls, orcs, pigeons, or other creatures unless they are strictly relevant to the request.

This story illustrates one of the more subtle challenges in developing language models: Even a single poorly calibrated reward signal can trigger a vicious cycle in which a behavior is rewarded, generalizes, transfers, and amplifies. Understanding it in time, developing the tools to identify it and correct it at its root is, according to OpenAI itself, a fundamental skill for anyone working in this field.

Alexander Marchall

Alexander Marchall is a distinguished journalist with over 15 years of experience in the realm of international media. A graduate of the Columbia School of Journalism, Alex has a fervent passion for global affairs and geopolitics. Prior to founding The Journal, he contributed his expertise to several leading publications.