Benchmark LLM Models - Search News

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

Decrypt

Tencent's New Hy3 AI Model Is the Most Efficient Chinese LLM No One's Talking About

Tencent just open-sourced Hy3 preview, a model that punches above its weight on coding agents, reasoning, and search—built in ...

SiliconANGLE

OpenAI details o3 reasoning model with record-breaking benchmark scores

OpenAI today detailed o3, its new flagship large language model for reasoning tasks. The model’s introduction caps off a 12-day product announcement series that started with the launch of a new ...

VentureBeat

Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data

Every AI model release inevitably includes charts touting how it outperformed its competitors in this benchmark test or that evaluation matrix. However, these benchmarks often test for general ...

Hosted on MSN

Local LLM benchmarks offer guidance for C++ AI use

A recent evaluation of three local large language models (LLMs) provides practical insights for developers integrating AI into C++ workflows. The comparison of Gemma 4 E4B, gpt-oss 20B, and Qwen 3.5 ...

Hosted on MSN

Nvidia’s Blackwell Conquers Largest LLM Training Benchmark

For those who enjoy rooting for the underdog, the latest MLPerf benchmark results will disappoint: Nvidia’s GPUs have dominated the competition yet again. This includes chart-topping performance on ...

Geeky Gadgets

How to Build Custom LLM Benchmarks for Your AI Applications

Have you ever wondered why off-the-shelf large language models (LLMs) sometimes fall short of delivering the precision or context you need for your specific application? Whether you’re working in a ...

TechCrunch

This LLM framework takes a first stab at benchmarking Big AI’s compliance with the EU AI Act

While most countries’ lawmakers are still discussing how to put guardrails around artificial intelligence, the European Union is ahead of the pack, having passed a risk-based framework for regulating ...

Business Wire

MLCommons Launches AILuminate, First-of-Its-Kind Benchmark to Measure the Safety of Large Language Models

SAN FRANCISCO--(BUSINESS WIRE)--MLCommons today released AILuminate, a first-of-its-kind safety test for large language models (LLMs). The v1.0 benchmark – which provides a series of safety grades for ...

Virtualization Review

AI's Heavy Hitters: Best Models for Every Task

In today's crowded AI landscape, organizations looking to leverage AI models are faced with an overwhelming number of options. But how to choose? An obvious starting point are all the various AI ...

Security

Simbian launches new security benchmark with AI SOC LLM Leaderboard

Simbian today announced the “AI SOC LLM Leaderboard,” a comprehensive benchmark to measure LLM performance in Security Operations Centers (SOCs). The new benchmark compares LLMs across a diverse range ...

InfoQ

Hugging Face Upgrades Open LLM Leaderboard v2 for Enhanced AI Model Comparison

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Dany Lepage discusses the architectural ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results