Close Menu
  • Home
  • Market News
    • Crude Oil Prices
    • Brent vs WTI
    • Futures & Trading
    • OPEC Announcements
  • Company & Corporate
    • Mergers & Acquisitions
    • Earnings Reports
    • Executive Moves
    • ESG & Sustainability
  • Geopolitical & Global
    • Middle East
    • North America
    • Europe & Russia
    • Asia & China
    • Latin America
  • Supply & Disruption
    • Pipeline Disruptions
    • Refinery Outages
    • Weather Events (hurricanes, floods)
    • Labor Strikes & Protest Movements
  • Policy & Regulation
    • U.S. Energy Policy
    • EU Carbon Targets
    • Emissions Regulations
    • International Trade & Sanctions
  • Tech
    • Energy Transition
    • Hydrogen & LNG
    • Carbon Capture
    • Battery / Storage Tech
  • ESG
    • Climate Commitments
    • Greenwashing News
    • Net-Zero Tracking
    • Institutional Divestments
  • Financial
    • Interest Rates Impact on Oil
    • Inflation + Demand
    • Oil & Stock Correlation
    • Investor Sentiment

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Eni Progresses Permit Process for Its 2nd Biorefinery in Sicily

October 12, 2025

Minister Puri highlights energy sector reforms, ETEnergyworld

October 12, 2025

Former Apple CEO Says OpenAI Is Its ‘First Real Competitor’ in Decades

October 12, 2025
Facebook X (Twitter) Instagram Threads
Oil Market Cap – Global Oil & Energy News, Data & Analysis
  • Home
  • Market News
    • Crude Oil Prices
    • Brent vs WTI
    • Futures & Trading
    • OPEC Announcements
  • Company & Corporate
    • Mergers & Acquisitions
    • Earnings Reports
    • Executive Moves
    • ESG & Sustainability
  • Geopolitical & Global
    • Middle East
    • North America
    • Europe & Russia
    • Asia & China
    • Latin America
  • Supply & Disruption
    • Pipeline Disruptions
    • Refinery Outages
    • Weather Events (hurricanes, floods)
    • Labor Strikes & Protest Movements
  • Policy & Regulation
    • U.S. Energy Policy
    • EU Carbon Targets
    • Emissions Regulations
    • International Trade & Sanctions
  • Tech
    • Energy Transition
    • Hydrogen & LNG
    • Carbon Capture
    • Battery / Storage Tech
  • ESG
    • Climate Commitments
    • Greenwashing News
    • Net-Zero Tracking
    • Institutional Divestments
  • Financial
    • Interest Rates Impact on Oil
    • Inflation + Demand
    • Oil & Stock Correlation
    • Investor Sentiment
Oil Market Cap – Global Oil & Energy News, Data & Analysis
Home » LMArena CTO Discusses AI Models and Google’s Nano Banana
U.S. Energy Policy

LMArena CTO Discusses AI Models and Google’s Nano Banana

omc_adminBy omc_adminSeptember 3, 2025No Comments6 Mins Read
Share
Facebook Twitter Pinterest Threads Bluesky Copy Link


An AI war is raging as tech companies race to build models — and sometimes, the best way to determine which model is the best is to have them battle it out.

A site called LMArena allows users to do just that. In 2023, a group of researchers from the University of California, Berkeley, started Chatbot Arena, now called LMArena. It allows people to compare different AI models with prompts and determine which is better. Users can vote for how well models perform and compare them on a leaderboard.

LMArena saw a tenfold traffic spike in August when a mysterious new AI text-to-image and image editing model, Nano Banana, went viral for churning out impressive images and photo edits. Based on user votes, Nano Banana ranked #1 on LMArena’s image generation leaderboard. As many users guessed, Google was behind Nano Banana, which is Google’s Gemini 2.5 Flash.

Now, LMArena has over 3 million monthly users, says Wei-Lin Chiang, its CTO. Chiang cofounded LMArena along with Berkeley researchers Anastasios Angelopoulos, the CEO, and Ion Stoica, also a cofounder of $62 billion Databricks and $1 billion Anyscale.

“We’re continuing to build a platform that’s open and accessible to anyone,” Chiang said. “We want people to test these models and express their opinions and preferences to help the community — including providers — evaluate AI grounded in real-world use cases.”

Business Insider caught up with Chiang on how LMArena started, the top AI models people are using, and his best guess on what Meta is building at its new Superintelligence Labs.

The interview has been edited for clarity and concision.

Why did you start LMArena?

LMArena started as a research project at UC Berkeley. ChatGPT came out before that, and the model released by Meta was Llama 1. People were trying to figure out which model is the best.

We wondered what the difference was between all these models. Traditional benchmarks didn’t tell us much, so we launched this project.

Initially, we called it Chatbot Arena. We wanted to build a community-focused evaluation to invite everyone to come and participate. It got quite a bit of attention.

Related stories

Business Insider tells the innovative stories you want to know

Business Insider tells the innovative stories you want to know

In the first few weeks, tens of thousands of people voted, meaning they asked a question and indicated which model was better. We used that to compile our first leaderboard. It was mostly some of the open-source models. At that time, the only proprietary chatbots were Claude and GPT. Over time, we added more models and got even more attention.

What are the top models on your platform, and which are the ones that are fast-growing?

It depends on the use cases. People come here and can ask any question. Some ask coding questions, and some ask open-ended questions, like creative writing prompts.

Claude is ranked the best in coding. In terms of creativity, I think Gemini is also at the top.

Beyond text, we also have different modalities. For example, on the vision leaderboard, people upload an image and ask questions about that image. In particular, Gemini is doing very well, and so is the GPT series. For text-to-image and image editing, that’s the one where we tested the latest Banana models.

Following the lackluster response to Llama 4 this year, how are developers using Llama? Are there any updates you expect from Llama?

We haven’t heard from them much lately, likely because they are internally figuring out how they’ll structure the new lab and team. We’ve been chatting with their Reality Labs team to work on potentially benchmarking multimodal models and products. We are looking forward to partnering with them to evaluate text and coding models.

Meta’s superintelligence team is building an “omni model.” Do you have guesses on what it might be?

A model consolidating modalities into one. That’s one of the trends we’re observing in the industry.

What do Google, Meta, and other Big Tech companies get out of putting their models on LMArena? Is it just building exposure, or do they get feedback to improve their models?

The main goal here is to build an open space where anyone can come and participate in evaluating all kinds of models. It’s community-driven and reflects how people think about all these different models by encouraging them to ask questions and vote for their preference.

When OpenAI, Google, or Meta come here to test their models, they are giving us a few variants of the model.

Basically, the same public leaderboard you’re seeing will tell them your model ranks #5, #10 in coding, #4 in creative writing, and so forth. We give them a detailed report and analysis on how their model is doing based on community-driven feedback. We are also open-sourcing some of the data we collect to the public, as well as the code and pipeline.

When all these models are benchmarking so close to one another, do we need new benchmarks?

Building more benchmarks would definitely benefit us. One core thing we want to ensure is that these benchmarks are grounded in real-world use cases.

If AI can save a doctor or a lawyer two hours a day, that will be a huge value add to society.

We want to ensure that we go beyond traditional benchmarks to benchmarks driven by real users and especially professional experts in using AI tools to get these jobs done.

Recently, we launched a benchmark we called WebDev. You can prompt a model to build a website. These are tools that can help people in tech build prototypes to get something done fast.

What do you think of that MIT report that said most companies that invested in AI aren’t seeing a return on their investments?

It’s an interesting study for sure. That’s why linking AI and grounding it in real-world use cases is particularly important.

That’s exactly why we want to build this and expand it to more industries. We started from the tech community. We believe in the tech, and people are getting a lot of value from AI. With Cursor and the Copilots of the world, people are obviously paying for it and leveraging it to build better and faster.

We would love to see this applied broadly to more industries. With the data we’re collecting, we want to help bridge that gap and help measure that.

Are there particular fields of query, like law, medicine, or education, where LLMs especially struggle to perform or answer appropriately?

We want to understand what percentage of queries are from these industries, legal and finance, and so on. We definitely would love to share when we get more insights and results.

The goal is to use the data we have to understand the model limitations and be transparent about how we do the data study, and release the data for the community to build upon.



Source link

Share. Facebook Twitter Pinterest Bluesky Threads Tumblr Telegram Email
omc_admin
  • Website

Related Posts

Former Apple CEO Says OpenAI Is Its ‘First Real Competitor’ in Decades

October 12, 2025

Unemployed Ex-Microsoft Worker Struggles to Find Job, Pay Rent

October 12, 2025

How Gen Z Organized an Anti-Social Media Day

October 11, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

LPG sales grow 5.1% in FY25, 43.6 lakh new customers enrolled, ET EnergyWorld

May 16, 20255 Views

South Sudan on edge as Sudan’s war threatens vital oil industry | Sudan war News

May 21, 20254 Views

Trump’s 100 days, AI bubble, volatility: Market Takeaways

December 16, 20072 Views
Don't Miss

Shenandoah field reaches 100,000 bpd milestone in deepwater U.S. Gulf

By omc_adminOctober 10, 2025

Beacon Offshore Energy announced that production from its Shenandoah deepwater development has reached the targeted…

Equinor prepares to start delayed deepwater project offshore Brazil

October 10, 2025

Worldly Acquires GoBlu to Build Unified Sustainability Data Ecosystem for Global Supply Chains

October 10, 2025

US Declines to Back World Bank Climate Statement Signed by 19 Directors

October 10, 2025
Top Trending

ESG Today: Week in Review

By omc_adminOctober 12, 2025

Morgan Stanley Backs Corvus Energy to Decarbonize Maritime Sector

By omc_adminOctober 10, 2025

Home Energy Storage Startup Base Power Raises $1 Billion

By omc_adminOctober 10, 2025
Most Popular

The Layoffs List of 2025: Meta, Microsoft, Block, and More

May 9, 20259 Views

Analysis: Reform-led councils threaten 6GW of solar and battery schemes across England

June 16, 20252 Views

Guest post: How ‘feedback loops’ and ‘non-linear thinking’ can inform climate policy

June 5, 20252 Views
Our Picks

Eni Progresses Permit Process for Its 2nd Biorefinery in Sicily

October 12, 2025

Companies Paying Record Sums to Develop Geothermal Energy

October 11, 2025

Kyiv Power Cut as Russia Steps Up Strikes

October 10, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 oilmarketcap. Designed by oilmarketcap.

Type above and press Enter to search. Press Esc to cancel.