Close Menu
  • Home
  • Market News
    • Crude Oil Prices
    • Brent vs WTI
    • Futures & Trading
    • OPEC Announcements
  • Company & Corporate
    • Mergers & Acquisitions
    • Earnings Reports
    • Executive Moves
    • ESG & Sustainability
  • Geopolitical & Global
    • Middle East
    • North America
    • Europe & Russia
    • Asia & China
    • Latin America
  • Supply & Disruption
    • Pipeline Disruptions
    • Refinery Outages
    • Weather Events (hurricanes, floods)
    • Labor Strikes & Protest Movements
  • Policy & Regulation
    • U.S. Energy Policy
    • EU Carbon Targets
    • Emissions Regulations
    • International Trade & Sanctions
  • Tech
    • Energy Transition
    • Hydrogen & LNG
    • Carbon Capture
    • Battery / Storage Tech
  • ESG
    • Climate Commitments
    • Greenwashing News
    • Net-Zero Tracking
    • Institutional Divestments
  • Financial
    • Interest Rates Impact on Oil
    • Inflation + Demand
    • Oil & Stock Correlation
    • Investor Sentiment

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

US proposes rules that could boost oil, gas output in US West – Oil & Gas 360

July 8, 2025

Indonesia’s Cirata Floating Solar Farm: Powering 50,000 Homes with Clean Energy

July 8, 2025

Oil eases as traders assess US tariffs, OPEC+ output hike – Oil & Gas 360

July 8, 2025
Facebook X (Twitter) Instagram Threads
Oil Market Cap – Global Oil & Energy News, Data & Analysis
  • Home
  • Market News
    • Crude Oil Prices
    • Brent vs WTI
    • Futures & Trading
    • OPEC Announcements
  • Company & Corporate
    • Mergers & Acquisitions
    • Earnings Reports
    • Executive Moves
    • ESG & Sustainability
  • Geopolitical & Global
    • Middle East
    • North America
    • Europe & Russia
    • Asia & China
    • Latin America
  • Supply & Disruption
    • Pipeline Disruptions
    • Refinery Outages
    • Weather Events (hurricanes, floods)
    • Labor Strikes & Protest Movements
  • Policy & Regulation
    • U.S. Energy Policy
    • EU Carbon Targets
    • Emissions Regulations
    • International Trade & Sanctions
  • Tech
    • Energy Transition
    • Hydrogen & LNG
    • Carbon Capture
    • Battery / Storage Tech
  • ESG
    • Climate Commitments
    • Greenwashing News
    • Net-Zero Tracking
    • Institutional Divestments
  • Financial
    • Interest Rates Impact on Oil
    • Inflation + Demand
    • Oil & Stock Correlation
    • Investor Sentiment
Oil Market Cap – Global Oil & Energy News, Data & Analysis
Home » Thanks to ChatGPT, the Pure Internet Is Gone. Did Anyone Save a Copy?
U.S. Energy Policy

Thanks to ChatGPT, the Pure Internet Is Gone. Did Anyone Save a Copy?

omc_adminBy omc_adminJune 4, 2025No Comments6 Mins Read
Share
Facebook Twitter Pinterest Threads Bluesky Copy Link


In the post-nuclear age, scientists noticed a peculiar problem: steel produced after 1945 was contaminated. Atomic bombs had infused the atmosphere with radioactivity, which contaminated the metal.

This made most steel useless for precise equipment such as Geiger counters and other highly accurate sensors. The solution? Salvage old steel from sunken pre-war battleships resting deep on the ocean floor, far away from the nuclear fallout. This material, known as low-background steel, became prized for its purity and rarity.

Fast forward to 2025, and a similar story is unfolding — not under the sea, but across the internet.

Since the launch of ChatGPT in late 2022, AI-generated content has exploded across blogs, search engines, and social media. The digital realm is increasingly infused with content not written by humans, but synthesized by models and chatbots. And just like radiation, this content is tricky for regular folks to detect, is pervasive, and it alters the environment in which it exists.

This phenomenon poses a particularly thorny problem for AI researchers and developers. Most AI models are trained on vast datasets collected from the web. Historically, that meant learning from human data: messy, insightful, biased, poetic, and occasionally brilliant. But if today’s AI is trained on yesterday’s AI-generated text, which was itself trained on last week’s AI content, then models risk folding in on themselves, diluting originality and nuance in what’s been dubbed “model collapse.”

Put another way: AI models are supposed to be trained to understand how humans think. If they’re trained mostly on their own outputs, they may end up just mimicking themselves. Like photocopying a photocopy, each generation becomes a little blurrier until nuance, outliers, and genuine novelty disappear.

This makes human-generated content, from before 2022, more valuable because it grounds AI models, and society in general, in a shared reality, according to Will Allen, a vice president at Cloudflare, which operates one of the largest networks on the internet.

This becomes especially important as AI models spread into technical fields, such as medicine, law, and tax. He wants his doctor to rely on content based on research written by human experts from real human trials, not AI-generated sources, for instance.

“The data that has that connection to reality has always been critically important and will be even more crucial in the future,” Allen said. “If you don’t have that foundational truth, it just becomes so much more complicated.”

Paul Graham’s problem

Y Combinator cofounder Paul Graham on stage in an interview

Paul Graham (left) found himself looking for pre-AI content to figure out how to set the temperature on a pizza oven.

Joe Corrigan/Getty Images for AOL



This isn’t just theoretical. Problems are already cropping up in the real world.

Related stories

Business Insider tells the innovative stories you want to know

Business Insider tells the innovative stories you want to know

Almost a year after ChatGPT launched, venture capitalist Paul Graham described searching online for how hot to set a pizza oven. He found himself looking at the dates of the content to find older information that wasn’t “AI-generated SEO-bait,” he said in a post on X.

Malte Ubl, CTO of AI startup Vercel and a former Google Search engineer, replied, saying Graham was filtering the internet for content that was “pre-AI-contamination.”

“The analogy I’ve been using is low background steel, which was made before the first nuclear tests,” Ubl said.

Matt Rickard, another former Google engineer, concurred. In a blog post from June 2023, he wrote that modern datasets are getting contaminated.

“AI models are trained on the internet. More and more of that content is being generated by AI models,” Rickard explained. “Output from AI models is relatively undetectable. Finding training data unmodified by AI will be tougher and tougher.”

The digital version of low-background steel

Cloudflare board member John Graham-Cumming speaking on stage

Cloudflare board member John Graham-Cumming is a human-generated data preservationist.

Tyler Miller/Sportsfile for Web Summit via Getty Images



The answer, some argue, lies in preserving digital versions of low-background steel: human-generated data from before the AI boom. Think of it as the internet’s digital bedrock, created not by machines but by people with intent and context.

One such preservationist is John Graham-Cumming, a Cloudflare board member and the company’s former CTO.

His project, LowBackgroundSteel.ai, catalogs datasets, websites, and media that existed before 2022, the year ChatGPT sparked the generative AI content explosion. For instance, there’s GitHub’s Arctic Code Vault, an archive of open-source software buried in a decommissioned coal mine in Norway. It was captured in February 2020, about a year before the AI-assisted coding boom got going.

Graham-Cumming’s initiative is an effort to archive content that reflects the web in its raw, human-authored form, uncontaminated by LLM-generated filler and SEO-optimized sludge.

Another source he lists is “wordfreq,” a project to track the frequency of words used online. Linguist Robyn Speer maintained this, but stopped in 2021.

“Generative AI has polluted the data,” she wrote in a 2024 update on coding platform GitHub.

This skews internet data to make it a less reliable guide to how humans write and think. Speer cited one example that showed how ChatGPT is obsessed with the word “delve” in a way that people never have been. This has caused the word to appear way more often online in recent years. (A more recent example is ChatGPT’s love of the em dash — don’t ask me why!)

Our shared reality

As Cloudflare’s Allen explained, AI models trained partly on synthetic content can accelerate productivity and remove tedium from creative work and other tasks. He’s a fan and regular user of ChatGPT, Google’s Gemini, and other chatbots such as Claude.

And just like human-generated data, the analogy to low-background steel is not perfect. Scientists have developed different ways to produce steel that use pure oxygen.

Still, Allen says, “you always want to be grounded in some level of truth.”

The stakes go beyond model performance. They reach into the fabric of our shared reality. Just as scientists trusted low-background steel for precise measurements, we may come to rely on carefully preserved pre-AI content to gauge the true state of the human mind — to understand how we think, reason, and communicate before the age of machines that mimic us.

The pure internet is gone. Thankfully, some people are saving copies. And like the divers salvaging steel from the ocean floor, they remind us: Preserving the past may be the only way to build a trustworthy future.

Sign up for Business Insider’s Tech Memo newsletter here. Reach out to me via email at abarr@busienssinsider.com.



Source link

Share. Facebook Twitter Pinterest Bluesky Threads Tumblr Telegram Email
omc_admin
  • Website

Related Posts

Private Jets Take Over Sun Valley As Billionaires and CEOs Land

July 8, 2025

Grok 3 Got a ‘Politically Incorrect’ Update Ahead of Grok 4’s Launch

July 8, 2025

Save on Top Picks From Sonos, Sony, and Apple

July 8, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

LPG sales grow 5.1% in FY25, 43.6 lakh new customers enrolled, ET EnergyWorld

May 16, 20255 Views

South Sudan on edge as Sudan’s war threatens vital oil industry | Sudan war News

May 21, 20253 Views

Trump’s 100 days, AI bubble, volatility: Market Takeaways

December 16, 20072 Views
Don't Miss

Indonesia’s Cirata Floating Solar Farm: Powering 50,000 Homes with Clean Energy

By omc_adminJuly 8, 2025

Indonesia, the world’s largest archipelago with over 17,000 islands, is making significant strides toward a…

JDR to support Larsen and Toubro with offshore umbilical testing in the Middle East  

July 8, 2025

Gabon’s oil & gas minister pushes drive to develop deepwater assets

July 8, 2025

bp, Libya’s NOC to explore oilfield redevelopment, unconventional potential

July 8, 2025
Top Trending

Honeywell gives shareholders another reason to like the stock ahead of its breakup

By omc_adminJuly 8, 2025

‘Serious risk to life’: scenic Isle of Wight road could fall into sea, councillor warns | Isle of Wight

By omc_adminJuly 8, 2025

UK public finances on ‘unsustainable’ path amid growing climate, debt and pension costs | Economic policy

By omc_adminJuly 8, 2025
Most Popular

The 5 Best Soundbars of 2025

May 6, 20251 Views

Energy Department Lifts Regulations on Miscellaneous Gas Products

May 2, 20251 Views

Private Jets Take Over Sun Valley As Billionaires and CEOs Land

July 8, 20250 Views
Our Picks

Who Is The World’s Top Natural Gas Consumer?

July 8, 2025

United Energy LNG, Power LNG Merge to Scale Up Modular Infrastructure

July 8, 2025

BP, Shell Sign Libya Deals as Majors Step Up Their Return

July 8, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 oilmarketcap. Designed by oilmarketcap.

Type above and press Enter to search. Press Esc to cancel.