Loading Now

Understanding Google’s AI Text Detection Tools

how google synthID and spambrain work

AI Tools for Research: Understanding Google’s Approach

In our recent article discussing the key aspects of the December 2025 Google Search Core Update, we highlighted the importance of crafting content from your own experiences. Nonetheless, many websites, including ours, resort to AI tools for research purposes.

Typically, we dedicate significant time asking a multitude of questions to thoroughly understand a subject. With the aid of advanced tools like Google Gemini, we can then condense this research into cohesive articles. After this, we refine the content before publishing it.

However, this writing approach might face penalties (although the extent remains uncertain) as Google and other search engines strive to eliminate AI-generated text perceived as derivative rather than original and beneficial.

Last year, Google unveiled “SynthID,” developed by its DeepMind division, to establish a permanent watermark within all Gemini-generated content. This has understandably raised concerns among content creators.

EDITOR’S NOTE: We are fans of Google Gemini for research purposes; it possesses vast knowledge drawn from hundreds of millions of sites, as well as full access to nearly every book ever published and Google search data. Check out our brief video on utilising Google Gemini and other generative AI tools for free.

The underlying message is clear: any content created or modified using Google AI is accompanied by SynthID. However, for those of us who utilise AI as a sophisticated tool rather than a mere ghostwriter, a more profound issue arises—it’s not just about the presence of a watermark, but how it manifests and whether this invisible marker will negatively impact our content’s search rankings.

Understanding the Mechanics of Text Watermarking

Fortunately, for those of us who meticulously refine AI outputs into genuinely original pieces, Google’s text watermarking is not a straightforward, static signature. Instead, it operates on a level of subtlety that involves statistical probabilities, offering both complexity and resilience against casual modifications.

Beyond Basic Patterns

We found out that SynthID does not merely identify predictable patterns, such as every third word being a verb or paragraphs commencing with sequential letters. Rather, Google’s SynthID uses a much more nuanced approach that shapes the AI’s choices from a broad selection of terms.

When Gemini produces text, it doesn’t opt for just the “best” or most likely next word. Instead, it examines a range of possibilities. For example, when contemplating “cat,” it might evaluate terms like “feline,” “tiger,” and “animal,” each possessing varying statistical possibilities. This is where the SynthID watermark comes into play.

The Green List and the Cryptographic Key

The essence of SynthID’s text watermarking resides in a method similar to “tournament sampling,” directed by a unique, cryptographically secure pseudorandom key. This key remains confidential, forming part of Google’s proprietary formula. As AI generates each word, this key dynamically categorises the English vocabulary into two distinct groups: the “Green List” and the “Red List.”

Here’s the clever part:

  • The Nudge: If Gemini’s natural tendency is to select a word from the “Red List” within a given context, SynthID may subtly influence the AI to favour a statistically less likely yet grammatically appropriate word from the “Green List.”
  • Contextual Fluidity: This is not a fixed “Green List” applicable forever or across all documents. The cryptographic key, alongside the context provided by previous words, means that a term like “alliance” may be on the “Green List” for a business report but not for a casual blog post—even within the same document. This dynamic partitioning ensures the watermark is not static but an ever-evolving statistical signature.

Recognising Patterns: From Bias to Signal

A single nudge may be inconsequential. However, across numerous instances throughout an article, these nudges can form a detectable pattern. Consider an article where a human might randomly select “Green List” words 50% of the time. In contrast, an AI, influenced by SynthID, may consistently opt for “Green List” words 70% of the time. This statistical deviation constitutes the watermark—a quiet ‘whisper’ that, when accumulated, becomes a loud proclamation to Google’s AI detection systems.

What Happens If You Rework Gemini’s Output with ChatGPT?

Ah, you clever thinker! But remember, Google is even cleverer!

Google has a secondary AI detection system called SpamBrain, which was quietly launched in 2018. It can identify AI-generated text without requiring a Gemini watermark. SpamBrain primarily examines:

  • Perplexity: The predictability of the words used.
  • Burstiness: Variability in sentence length.
  • Patterned Manipulation: Unnatural clustering of keywords that would not typically flow for human readers.

Many AI language models, including prominent ones like ChatGPT and Microsoft Copilot, exhibit similar patterns, allowing Google to spot them with a high degree of certainty, watermark or not.

Does Google Penalise Based on AI Content?

Not necessarily. We leveraged various AI tools, including Gemini, to compile this article. We needed to explore some intricate topics thoroughly to present them accurately to you. Interestingly, Google informed us:

Google’s official stance heading into 2025 remains focused on content quality, not its origin. A SynthID watermark does not automatically trigger a penalty. However, the manner in which you employ AI is more crucial than ever.

Prioritising Content Quality Over AI Origin

Google’s ranking algorithms are designed to reward helpful content demonstrating E-E-A-T (Experience, Expertise, Authoritativeness, and Trustworthiness).

The Policy: Google has explicitly stated that AI-generated content does not breach their guidelines as long as it is not used to manipulate search results (spamming).

The Watermark: SynthID serves as a transparency measure, not a “spam flag.” Google utilises it to identify synthetic media across the web, but their Search team does not use it as a metric for lowering page rankings.

Do Other Search Engines Scan for AI Content?

Absolutely! Here’s how some prominent search engines address low-quality AI-generated content:

Tool / Platform Watermarking Status Details
Google (Gemini) Yes (SynthID) Fully integrated. Invisible, statistical watermarking for text and images. It is the “gold standard” for 2025.
Bing / Microsoft Partial / Metadata Utilises C2PA standards for images. For text, they rely more on labelling (“Generated by AI”).
OpenAI (ChatGPT) Internal / Research Possesses the technology but rarely uses it for text to minimise user disruption.
Meta (Llama/FB) Yes (Stable Signature) Robust pixel-level watermarking for images. They actively label AI content across Instagram and Facebook.
Yandex (Russia) Minimal / Opaque Uses AI (YandexART) but lacks evidence of invisible statistical text watermarking.
Baidu / Erine (China) Mandatory (Strict) By 2025 regulations, explicit and implicit labels are used. Removing these is prohibited.
DuckDuckGo Detection Only Utilises others’ models but has launched an “AI Filter” in 2025 to enable users to hide AI results.

In Summary

Google’s Gemini watermark (SynthID) alongside SpamBrain may result in some false positives that could impact writers, including you and me, in terms of page rankings. However, these tools are established and are expected to improve in accuracy over time.

The Google DeepMind division is continually working on methods to assess whether you have genuinely created something original and useful or if Google derived it from others’ hard work.