Sakana AI Promises Better AI by Teaching Large Language Models the Art of Cooperation

New "adaptive branching Monte Carlo tree search" lets you put multiple LLMs to work on a single problem, the Japanese company claims.

Japanese artificial intelligence startup Sakana AI claims to have come up with a way to turn multiple large language models (LLMs) onto a single problem, allowing them to work cooperatively with better results than ever before — though the company has a history of making big claims that have failed to stand up to peer review.

"At Sakana AI, we develop AI systems by applying nature-inspired principles, such as evolution and collective intelligence. In our 2024 research on evolutionary model merging, we harnessed the vast collective intelligence of existing open-source models through evolutionary computation and model merging," the company explains. "This led us to a new question: Can we utilize multiple models not only for building new models but also during inference? Can we utilize the ever-advancing frontier models, such as [OpenAI] ChatGPT, [Google] Gemini, and [Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co.] DeepSeek, to leverage them as a form of collective intelligence?"

Sakana AI says it has come up with a way to put multiple LLMs to work on a single problem, delivering better results than any one model alone. (📷: Sakana AI)

The answer, Sakana AI's researchers claim, is yes — using a scaling algorithm dubbed adaptive branching Monte Carlo tree search (AB-MCTS), which allows models to work on a trial-and-error basis and collaborate with others. It's an extension of the company's previous "evolutionary model merge" approach, which aimed to combine multiple large language models into a single combined model. This time, though, each model is kept separate — combining their efforts at the time of inference, rather than before.

"Our AB-MCTS combination of [OpenAI] o4-mini + [Google] Gemini-2.5-Pro + [DeepSeek] R1-0528, current frontier AI models as of writing, achieves strong performance on the ARC-AGI-2 benchmark," the company claims, "outperforming individual o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528 models by a large margin."

The company's approach combines a way for models to move between width- and depth-first approaches with a way for multiple models to work collaboratively. (📷: Sakana AI)

It's a claim that would put Sakana AI at the top of the tree for LLM performance, allowing its combination of models to return better results — or, technically speaking, better result-shaped objects formed of statistically-likely token stream continuations — than rivals working with any single model. The company, however, has previous form for making bold claims that do not stand up to peer review: back in February this year Sakana AI said it had created an "AI CUDA engineer" capable of boosting the performance of PyTorch projects by orders of magnitude compared to a skilled human coder, only to later admit that its system had instead simply broken the benchmarks used.

Those who wish to check out the company's claims for themselves, though, can find more on the Sakana AI blog and in an unreviewed preprint, written for the ICLR 2025 Workshop on Foundation Models in the Wild, published on Cornell's arXiv server.

ghalfacree

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.

Latest Articles