The “AI text detector”, or AI-generated text detectors, are software that analyzes the style, structure and words used to estimate the chance that a text was written by aartificial intelligence.
The problem, however, is that AI is also improving quickly. Tools like ChatGPT are now capable of writing in a natural and realistic way, so much so that distinguishing between a text entirely written by a person, one revised by AI and one generated from scratch is becoming increasingly complicated. Precisely for this reason, AI detectors they are not 100% reliable and should never be used as the sole criterion for making important decisions. They can give an indication, but we cannot completely rely on them.
Let’s see in more detail what AI text detectors are, how they work and what the main critical issues are.
What are AI text detectors and how do AI detectors work
AI text detectors are software designed for analyze a text and estimate whether it was generated by artificial intelligence. Some assign a percentage (for example: 80% AI, 20% human), others simply classify the text as “human”, “hybrid” or “AI-generated”. Some even highlight the suspicious phrases and explain why they seem artificial.
Detectors, as we know them today, have existed since 2019. Shortly after the arrival of the first GPT, we realized how skilled machines were becoming at simulating human language and the risks this entailed. Since then, AI has made enormous progress, and the software to recognize it has tried to keep up.
Today they exist dozens of different toolsboth free and paid. Among the free ones, the best known and most used are: GPTZero, zeroGPT, Scribbr, Neural Writer, Grammarly’s AI detector and NoPlagio. Let’s see how they understand if a text was written by an AI.
There is no single way to understand if a text was written by an artificial intelligence. Each AI detector uses different strategies, but almost all are based on some typical characteristics of AI writing. The main ones are:
- Linear sentences and poorly researched words: Texts generated by AI tend to respect grammar, follow coherent patterns, linear sentences and poorly researched words. Human texts, however, tend to be more varied and imperfect.
- Uniformity in sentence length (“Burstiness”): AIs tend to maintain more uniform and regular sentence lengths. This parameter measures precisely how much the text “swings” in terms of length and complexity of the sentences. The less uniform the text, the more “human” it is.
- Predictability of sentences (“Perplexity”): AIs use very frequent and predictable linguistic patterns. “Perplexity” is used to measure this “predictability”: the more predictable the text, the more likely it was written by AI.
To these, you can add techniques such as:
- Hidden Watermarks: Some AI models can intentionally insert invisible “fingerprints” into text: specific word frequencies, syntactic patterns, rhythms. But only those who know the pattern can actually detect them and use them to design a detector, so this technique is limited to developers.
- Stability tests: Some detectors modify the text by changing some words with synonyms and measure how much the “perplexity” changes. If it varies a lot, it is likely that the text was written by an AI, otherwise, it could be by a human.
All these techniques help to build an estimate, but none of them guarantee us that the answers they give us are correct. AI detectors can make mistakes. And they do it often.
How reliable are AI detectors and how to use them
ChatGPT, Gemini, DeepSeek and Claude write differently. So how can a detector succeed? always recognize a text written by AI? The short answer is that it can’t.
The ability to correctly classify a text as “written by AI” or “written by a human” depends on many factors:
- from the model with which the text was generated;
- from the length of the text: the longer it is, the more reliable the analysis is;
- how long writes well the person who produced it;
- from the tongue used: the detectors work better with English than with Italian;
- on whether the writer is using their own or not mother tongue. For example, if an Italian person writes in English, they will tend to use simpler and more standard sentences: they could therefore be mistaken for an AI.
The degree of AI intervention also matters. If we use a model just for rephrase some sentencesor to write everything from scratch, the detector may not be able to distinguish between human text and artificial text.
Furthermore, there are methods to ask the AI to write in a more “human” way and tools that “humanize” the text (called “AI humanizer“), which allow it to pass the control of most detectors. We tested these techniques on a text completely written by an AI, then we created a “humanized” version of it and loaded them both onto five free AI detectors. The responses were very different from each other, but the majority recognized the first text as generated by an AI and the second as written by a human, even if they were both artificial.
A good rule, therefore, if you want to use these tools, is to do not consider them infallible predictorsbut simple directions. And above all, not to base important decisions only on their answers.
