Cloudflare unveiled a new policy framework designed to give website owners, publishers, and content creators greater control over how AI systems access and use their content.
The policy, called the Content Signals Policy, creates a potentially powerful new license for the web that targets Google’s AI-powered search offerings, in particular.
The web is making a pivotal shift from traditional search engines to AI-powered answer engines that create responses directly from scraped content, often without linking back to original sources. This threatens the web’s original traffic-driven model, which rewards content creation with clicks, views, and revenue.
Most AI companies, including OpenAI, have separate web crawling bots for search services and AI offerings. In contrast, Google’s main search bot collects data from websites to feed both traditional search results and new AI-powered answer engines, including AI Overviews.
Cloudflare’s new bot policy and license target Google’s data-scraping advantage and seek to even the playing field, said CEO Matthew Prince.
“Every AI answer engine should have to play by the same rules,” he told Business Insider. “Google combines its crawler for search with its AI answer engines, which gives them a unique and unfair advantage. We are making clear that there are now different rules for search and AI answer engines.”
Cloudflare can help block AI bot crawlers
The Content Signals Policy, announced on Wednesday, builds on the company’s existing web crawling bot management service, with new signals specifically aimed at AI crawlers and data scrapers.
Websites use a standard called robots.txt to control how bots access their data. This system was established at the dawn of the web. Now, the boom in AI bot scraping is putting pressure on it. Essentially, it’s a gentlemen’s agreement, and some AI companies ignore these website preferences and still crawl sites for data because their thirst for it is so strong.
Related stories
More than 3.8 million domains already use Cloudflare’s robots.txt service. Cloudflare is introducing what is essentially a new license for websites to help them explicitly block or allow AI bot crawlers in nuanced and more powerful ways.
What this means for Google
Prince said this license could carry legal weight, especially for Google.
“Google’s legal team will see this for what it is — a contract with legal ramifications if they ignore it,” Prince said.
Prince added that Cloudflare helps run about 20% of the web, so this new license will be applied automatically to millions of websites on Wednesday.
That sets up a choice for Google, Prince said. The tech giant can either stop crawling these sites for its search engine, which would mean missing out on a large chunk of web content. Or, Google could comply and separate its bot crawlers, having one for traditional search and one for AI answer engines, Prince said.
Cloudflare specifically mentioned Google’s AI Overviews in its announcement on Wednesday, saying these new settings will let websites block bots that collect data for AI Overviews and “inference,” or how AI models draw conclusions and create outputs from data.
“The internet cannot wait for a solution while in the meantime, creators’ original content is used for profit by other companies,” Prince said.
Google has said that its new AI-powered search features still send traffic to websites and may even send higher-quality traffic. The company’s executives have also stressed that they care deeply about the health and vibrancy of the web.
Prince said that OpenAI is being more responsible here by separating out its crawling bots, having one for its core AI operations and another for search functions.
Users have more control over AI bots
Cloudflare’s new tool allows creators to clearly express preferences about how their content may be used, choosing either “Yes” content can be used or “No” content should not be used.
More importantly, the policy distinguishes between different AI-related uses, including search, AI input, and AI training, and it reminds crawlers that robots.txt declarations can carry “legal significance.”
A search preference would tell bot crawlers that a site’s content can only be scraped for use in traditional search engines that send users to the original source of the information.
The AI input preference covers increasingly common situations where AI chatbots and AI models roam the web and collect website data for immediate summarization and inclusion in AI outputs.
The third preference, AI training, would allow websites to block AI bots that scrape data for use in the initial pre-training process, in which AI models are trained to understand the world in broad and powerful ways.
“To ensure the web remains open and thriving, we’re giving website owners a better way to express how companies are allowed to use their content,” Prince said. “Robots.txt is an underutilized resource that we can help strengthen and make it clear to AI companies that they can no longer ignore a content creator’s preferences.”
Sign up for BI’s Tech Memo newsletter here. Reach out to me via email at abarr@businessinsider.com.