ChatGPT, Bard or Bing? 40,000 people voted for the best generative AI model

ChatGPT, Bard or Bing?  40,000 people voted for the best generative AI model

At UC Berkeley’s ‘Chatbot Arena’, anyone can join a live competition created to blindly compare responses from ChatGPT, Google Bard, Anthropic and other AI models. Here is the current leader.

(Credit: Supatman/Getty Images)

Using ChatGPT can generate a mix of useful information and nonsensical responses, making it difficult to assess the overall performance of the chatbot. And the companies that make generative AI tools, including OpenAI, Google and Microsoft, keep secret the data they use and how their AI models actually work.

How to test chatbots

To learn more about generative AI tools, the University of California, Berkeley, founded a group called the Large Model Systems Organization (LMSYS Org(Opens in a new window)), in partnership with the University of California, San Diego (UCSD) and Carnegie. Mellon University (CMU). It is composed of 10 students and four faculty members in the AI ​​research and computer science departments. LMSYS Org has created an experiment, the “Chatbot Arena”, a custom website where anyone can chat anonymously with two models at once.

Once the user has formed an opinion on which chatbot responses they prefer, they vote for a favorite and only later find out which models they were talking to. The site uses the same extended language models (LLMs) that ChatGPT and others use, and repackages LLMs in a new interface as companies like OpenAI have made them publicly available. The site also contains smaller models created by individuals.

chatbot field

(Credit: LMSYS Organization)

“We started this because we created our own AI model based on Meta’s LLaMA model in April, [which we] called Vicuna, and we wanted to train different versions and iterate on it,” says Hao Zhang(Opens in a new window), one of the UCSD professors co-leading the effort. “It primarily measures human preference and their ability to follow instructions and do the task that the human being wants, which is a very important factor in making a model useful.

See also  New to New York City? Explore the Magic Of These Christmas Markets This Holiday Season

The group has steadily added more models to the arena, and as of April, some 40,000 people have participated, Zhang says.

The chatbot arena

We tested the Chatbot Arena, below. Not knowing which two AI models the page chose to compare, we asked both of them to “write an email to my family telling them that I booked flights for Thanksgiving, arriving on November 22 and departing on November 30.” . Each generated a suggested email. We selected Model B as the preferred option.

The page then revealed that Model B was Claude, an AI assistant created by Anthropic(Opens in a new window). Model A was called gpt4all-13b-snoozy(Opens in a new window), created by Nomic AI(Opens in a new window).

Chatbot Arena Example

Two AI models compete for the best answer in the Chatbot Arena. (Credit: LMSYS Org, Emily Dreibelbis)

The site takes each user’s vote into account to create a rating using the Elo system, which “is a rating system widely used in chess and other competitive games,” says a blog post by LMSYS Org (Opens in a new window).

“I’ve seen this leaderboard published on several respected research sites,” says Federico Pascual, who previously worked at Hugging Face, which maintains its own leaderboard of custom AI models (opens in a new window). “This is an active area of ​​research as people are figuring out how to test these models. In three months or six months, [the Chatbot Arena leaderboard] It will probably look different.”

And the winner is…

ChatGPT’s most advanced model, GPT-4, currently tops the list with an Elo rating of 1225. It’s available with a ChatGPT Plus account ($20 per month). Next, two versions of Claude, made by Anthropic, rank second (1,195) and third (1,153). Claude is currently available through a waiting list; we were able to start using it within a few weeks.

See also  Did You Know About The Only Two White Natives Born In North Korea & Their Lives As Propaganda Stars

The free version of ChatGPT is fourth, with its model GPT-3.5 (1,143). OpenAI recommends GPT-3.5 for most everyday tasks, as it runs faster than GPT-4 and is still very powerful. For that reason, it is also available in the paid version. But keep in mind that Microsoft’s new Bing AI search, which is free, also runs on GPT-4 (opens in a new window).

With GPT-4 and GPT-3.5 at the top of the rankings, and the fact that Claude is on the waiting list, ChatGPT and Microsoft Bing are the most accessible current favorites.

Chatbot Arena Ranking

Chatbot Arena Ranking as of June 2023. (Credit: LMSYS Org)

The model behind Google Bard, PaLM 2, ranks sixth (1,042). Zhang points out that Google makes several versions of PaLM 2 and has not confirmed that the model in Chatbot Arena is the same as the one behind Bard. Zhang has contacted Google but says “they are very secretive” and would not confirm this. Separately, Zhang’s team compared the version on Chatbot Arena with Google Bard, which confirmed that it is “at least very similar to what people can access on Bard,” if not identical.

Tweet(Opens in a new window)

AI concerns

From all his work with LLM, Zhang has identified some concerns about its widespread adoption. He agrees with OpenAI CEO Sam Altman, Elon Musk, Bill Gates and others who have called for more AI regulation.

Specifically, Zhang thinks there are two issues that need more attention. The first is data privacy, as these models can scrape the web and distill that data into usable information better than ever before. Another problem is keeping the data that feeds the models high-quality and useful. If AI models can generate their own content using what’s available on the web, Zhang believes there won’t be an incentive for humans to create new and better content.

See also  A perfect display for business: living with the HP E45c G5 45-inch Curved Monitor

“These great models of language [rely on] quality content, which is created by humans, “he says.” So if they don’t incentivize people to create good materials, how can you guarantee that they will improve the quality of life?”

Categories: Trending

Leave a Comment