Close Menu
  • Home
  • Market News
    • Crude Oil Prices
    • Brent vs WTI
    • Futures & Trading
    • OPEC Announcements
  • Company & Corporate
    • Mergers & Acquisitions
    • Earnings Reports
    • Executive Moves
    • ESG & Sustainability
  • Geopolitical & Global
    • Middle East
    • North America
    • Europe & Russia
    • Asia & China
    • Latin America
  • Supply & Disruption
    • Pipeline Disruptions
    • Refinery Outages
    • Weather Events (hurricanes, floods)
    • Labor Strikes & Protest Movements
  • Policy & Regulation
    • U.S. Energy Policy
    • EU Carbon Targets
    • Emissions Regulations
    • International Trade & Sanctions
  • Tech
    • Energy Transition
    • Hydrogen & LNG
    • Carbon Capture
    • Battery / Storage Tech
  • ESG
    • Climate Commitments
    • Greenwashing News
    • Net-Zero Tracking
    • Institutional Divestments
  • Financial
    • Interest Rates Impact on Oil
    • Inflation + Demand
    • Oil & Stock Correlation
    • Investor Sentiment

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Gujarat Gas restricts industrial supply, invokes force majeure, ETEnergyworld

March 5, 2026

Qatar halts LNG output after attacks; disruptions to hit Indian supplies, city gas firms flag concerns, ETEnergyworld

March 5, 2026

StanChart Hikes Oil Price Forecast To $74 Per Barrel Amid Iran Conflict

March 5, 2026
Facebook X (Twitter) Instagram Threads
Oil Market Cap – Global Oil & Energy News, Data & Analysis
  • Home
  • Market News
    • Crude Oil Prices
    • Brent vs WTI
    • Futures & Trading
    • OPEC Announcements
  • Company & Corporate
    • Mergers & Acquisitions
    • Earnings Reports
    • Executive Moves
    • ESG & Sustainability
  • Geopolitical & Global
    • Middle East
    • North America
    • Europe & Russia
    • Asia & China
    • Latin America
  • Supply & Disruption
    • Pipeline Disruptions
    • Refinery Outages
    • Weather Events (hurricanes, floods)
    • Labor Strikes & Protest Movements
  • Policy & Regulation
    • U.S. Energy Policy
    • EU Carbon Targets
    • Emissions Regulations
    • International Trade & Sanctions
  • Tech
    • Energy Transition
    • Hydrogen & LNG
    • Carbon Capture
    • Battery / Storage Tech
  • ESG
    • Climate Commitments
    • Greenwashing News
    • Net-Zero Tracking
    • Institutional Divestments
  • Financial
    • Interest Rates Impact on Oil
    • Inflation + Demand
    • Oil & Stock Correlation
    • Investor Sentiment
Oil Market Cap – Global Oil & Energy News, Data & Analysis
Home » LMArena CTO Discusses AI Models and Google’s Nano Banana
U.S. Energy Policy

LMArena CTO Discusses AI Models and Google’s Nano Banana

omc_adminBy omc_adminSeptember 3, 2025No Comments6 Mins Read
Share
Facebook Twitter Pinterest Threads Bluesky Copy Link


An AI war is raging as tech companies race to build models — and sometimes, the best way to determine which model is the best is to have them battle it out.

A site called LMArena allows users to do just that. In 2023, a group of researchers from the University of California, Berkeley, started Chatbot Arena, now called LMArena. It allows people to compare different AI models with prompts and determine which is better. Users can vote for how well models perform and compare them on a leaderboard.

LMArena saw a tenfold traffic spike in August when a mysterious new AI text-to-image and image editing model, Nano Banana, went viral for churning out impressive images and photo edits. Based on user votes, Nano Banana ranked #1 on LMArena’s image generation leaderboard. As many users guessed, Google was behind Nano Banana, which is Google’s Gemini 2.5 Flash.

Now, LMArena has over 3 million monthly users, says Wei-Lin Chiang, its CTO. Chiang cofounded LMArena along with Berkeley researchers Anastasios Angelopoulos, the CEO, and Ion Stoica, also a cofounder of $62 billion Databricks and $1 billion Anyscale.

“We’re continuing to build a platform that’s open and accessible to anyone,” Chiang said. “We want people to test these models and express their opinions and preferences to help the community — including providers — evaluate AI grounded in real-world use cases.”

Business Insider caught up with Chiang on how LMArena started, the top AI models people are using, and his best guess on what Meta is building at its new Superintelligence Labs.

The interview has been edited for clarity and concision.

Why did you start LMArena?

LMArena started as a research project at UC Berkeley. ChatGPT came out before that, and the model released by Meta was Llama 1. People were trying to figure out which model is the best.

We wondered what the difference was between all these models. Traditional benchmarks didn’t tell us much, so we launched this project.

Initially, we called it Chatbot Arena. We wanted to build a community-focused evaluation to invite everyone to come and participate. It got quite a bit of attention.

Related stories

Business Insider tells the innovative stories you want to know

Business Insider tells the innovative stories you want to know

In the first few weeks, tens of thousands of people voted, meaning they asked a question and indicated which model was better. We used that to compile our first leaderboard. It was mostly some of the open-source models. At that time, the only proprietary chatbots were Claude and GPT. Over time, we added more models and got even more attention.

What are the top models on your platform, and which are the ones that are fast-growing?

It depends on the use cases. People come here and can ask any question. Some ask coding questions, and some ask open-ended questions, like creative writing prompts.

Claude is ranked the best in coding. In terms of creativity, I think Gemini is also at the top.

Beyond text, we also have different modalities. For example, on the vision leaderboard, people upload an image and ask questions about that image. In particular, Gemini is doing very well, and so is the GPT series. For text-to-image and image editing, that’s the one where we tested the latest Banana models.

Following the lackluster response to Llama 4 this year, how are developers using Llama? Are there any updates you expect from Llama?

We haven’t heard from them much lately, likely because they are internally figuring out how they’ll structure the new lab and team. We’ve been chatting with their Reality Labs team to work on potentially benchmarking multimodal models and products. We are looking forward to partnering with them to evaluate text and coding models.

Meta’s superintelligence team is building an “omni model.” Do you have guesses on what it might be?

A model consolidating modalities into one. That’s one of the trends we’re observing in the industry.

What do Google, Meta, and other Big Tech companies get out of putting their models on LMArena? Is it just building exposure, or do they get feedback to improve their models?

The main goal here is to build an open space where anyone can come and participate in evaluating all kinds of models. It’s community-driven and reflects how people think about all these different models by encouraging them to ask questions and vote for their preference.

When OpenAI, Google, or Meta come here to test their models, they are giving us a few variants of the model.

Basically, the same public leaderboard you’re seeing will tell them your model ranks #5, #10 in coding, #4 in creative writing, and so forth. We give them a detailed report and analysis on how their model is doing based on community-driven feedback. We are also open-sourcing some of the data we collect to the public, as well as the code and pipeline.

When all these models are benchmarking so close to one another, do we need new benchmarks?

Building more benchmarks would definitely benefit us. One core thing we want to ensure is that these benchmarks are grounded in real-world use cases.

If AI can save a doctor or a lawyer two hours a day, that will be a huge value add to society.

We want to ensure that we go beyond traditional benchmarks to benchmarks driven by real users and especially professional experts in using AI tools to get these jobs done.

Recently, we launched a benchmark we called WebDev. You can prompt a model to build a website. These are tools that can help people in tech build prototypes to get something done fast.

What do you think of that MIT report that said most companies that invested in AI aren’t seeing a return on their investments?

It’s an interesting study for sure. That’s why linking AI and grounding it in real-world use cases is particularly important.

That’s exactly why we want to build this and expand it to more industries. We started from the tech community. We believe in the tech, and people are getting a lot of value from AI. With Cursor and the Copilots of the world, people are obviously paying for it and leveraging it to build better and faster.

We would love to see this applied broadly to more industries. With the data we’re collecting, we want to help bridge that gap and help measure that.

Are there particular fields of query, like law, medicine, or education, where LLMs especially struggle to perform or answer appropriately?

We want to understand what percentage of queries are from these industries, legal and finance, and so on. We definitely would love to share when we get more insights and results.

The goal is to use the data we have to understand the model limitations and be transparent about how we do the data study, and release the data for the community to build upon.



Source link

Share. Facebook Twitter Pinterest Bluesky Threads Tumblr Telegram Email
omc_admin
  • Website

Related Posts

‘Decoy’ Tesla Distracted Photographers Staked Outside Elon Musk Trial

March 4, 2026

An Entity With Ties to Sergey Brin Purchased a $51 Mansion in Maimi

March 4, 2026

Amazon Layoffs Continue As Robotics Division Cuts Staff

March 4, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Federal Reserve cuts key rate for first time this year

September 17, 202513 Views

Inflation or jobs: Federal Reserve officials are divided over competing concerns

August 14, 20259 Views

Oil tanker rates to stay strong into 2026 as sanctions remove ships for hire – Oil & Gas 360

December 16, 20258 Views
Don't Miss

UK operators meet with Chancellor Reeves on junking Energy Profits Levy

By omc_adminMarch 4, 2026

(WO) – North Sea oil and gas operators met with UK Chancellor of the Exchequer…

Seadrill alliance targets remote DP operations for offshore drilling

March 4, 2026

Senate energy committee approves Steve Pearce for BLM director

March 4, 2026

Bay du Nord offshore project advances with new benefits agreement

March 4, 2026
Top Trending

Global sea levels have been underestimated due to poor modelling, research suggests | Oceans

By omc_adminMarch 4, 2026

EU Commission Unveils Industrial Accelerator Act with New Made-in-EU Requirements for Cleantech Procurement

By omc_adminMarch 4, 2026

Moeve to Build $1.2 Billion Green Hydrogen Plant in Spain

By omc_adminMarch 4, 2026
Most Popular

The 5 Best 65-Inch TVs of 2025

July 3, 202515 Views

AI’s Next Bottleneck Isn’t Just Chips — It’s the Power Grid: Goldman

November 14, 202514 Views

The Layoffs List of 2025: Meta, Microsoft, Block, and More

May 9, 202510 Views
Our Picks

Bay du Nord clears key hurdle as Canada, Equinor and bp sign benefits agreement

March 4, 2026

Crude Volatile as Hormuz Risks Increase

March 4, 2026

Bay du Nord offshore project advances with new benefits agreement

March 4, 2026

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2026 oilmarketcap. Designed by oilmarketcap.

Type above and press Enter to search. Press Esc to cancel.