Ant's Ling 2.5 says the China open-model race is becoming a token-efficiency race

Ant A Space in Hangzhou is a grounded visual anchor for inclusionAI's model-card story: the public claim is not just a lab benchmark, but a Hangzhou platform company trying to turn open models into efficient infrastructure.[5]

As of 2026-04-22T09:03:44Z UTC, Ant Group's inclusionAI model cards make a useful AI-China point: the next open-model contest is no longer only about who can publish the largest checkpoint or the highest single benchmark number. Ling-2.5-1T and Ring-2.5-1T push a narrower question into view: how much useful reasoning, tool use, and long-context work can a trillion-parameter model deliver for each generated token and each unit of serving memory?[1][2][3]

That is the right frame because both releases are deliberately split by workload. Ling-2.5-1T is presented as an "instant" model: 1T total parameters, 63B active parameters, a pretraining corpus expanded to 29T tokens, hybrid linear attention, and context support up to 1M tokens through YaRN extrapolation.[1] Ring-2.5-1T is presented as the thinking sibling: a hybrid-linear-attention reasoning model aimed at deep thinking and long-horizon task execution, with self-tested math results and agent-search/tool benchmarks emphasized beside the architecture claims.[2]

The important benchmark note is that these two claims should not be collapsed. Ling is being evaluated as the efficient general lane: instruction following, long context, agent compatibility, and lower token consumption. Ring is being evaluated as the deeper reasoning lane: mathematical proof, coding, tool collaboration, and extended execution.[1][2] If a routing team compares either one to a generic leaderboard row without preserving that distinction, the result will be misleading.

Image context: the cover photograph shows Ant A Space in Hangzhou in 2021. It is not a screenshot of a model card or a launch graphic. That choice is intentional. The article is about infrastructure economics around Ant's open-model family, so a real photograph of Ant's Hangzhou campus is more honest than another benchmark chart.[5]

The headline is not only 1T

The 1T label matters, but it is the least interesting part of the release. Ling-2.5-1T's model card says the trillion-scale version activates 63B parameters, up from 51B in the prior Ling 2.0 trillion-scale architecture, while also changing the attention mix to a 1:7 ratio of MLA plus Lightning Linear after incremental training.[1] That gives the model card a concrete technical claim: larger active capacity does not have to mean proportionally worse long-context serving behavior if the attention path changes.

The card makes the same point in benchmark language. It says Ling-2.5-1T was evaluated across knowledge, reasoning, agentic performance, instruction following, and long-context processing, and that it uses fewer tokens than frontier "thinking" models for comparable reasoning performance in selected cases.[1] The exact benchmark comparisons remain vendor-reported, so they should be treated as first-party claims. Still, the metric choice matters. Ant is asking readers to evaluate token efficiency, not only answer accuracy.

That is a good eval pivot. In production, a model that reaches a similar answer with fewer output tokens changes latency, queueing, cost, and context-management behavior. It also changes what can be routed to an "instant" lane before escalating to a slower thinking lane. The post-release question for builders is therefore not "Is Ling 2.5 smarter than every rival?" It is "Which workloads can Ling answer well enough before a thinking model becomes worth the extra token budget?"

Ling's own limitations section supports that reading. The card says Ling-2.5-1T lays groundwork for general-purpose agents, but still lags frontier models in complex agent interactions and long-horizon tasks.[1] That caveat is not a weakness in the article's thesis. It is the routing boundary. Ling is strongest as the fast, long-context, open-weight lane; Ring is where Ant is trying to push deeper reasoning and long-horizon execution.

Ring moves the eval toward decoding economics

Ring-2.5-1T sharpens the same story from the reasoning side. Its model card calls it the first open-source trillion-parameter thinking model based on hybrid linear attention architecture, then gives the operational claim: for sequences over 32K tokens, it reports over 10x lower memory access overhead and more than 3x higher generation throughput compared with the prior path.[2] Those are first-party architecture claims, but they are exactly the kind of claims that matter when "thinking" models stretch output length.

The evaluation section then joins math and agent work. The card reports self-tested IMO 2025 and CMO 2025 results, says Ring-2.5-1T reaches gold-medal level on both, and lists harder reasoning and execution benchmarks including IMOAnswerBench, AIME 26, HMMT 25, LiveCodeBench, ARC-AGI-V2, Gaia2-search, Tau2-bench, and SWE-Bench Verified.[2] The repository linked from the card also publishes example solution folders for IMO25 and CMO25, which is a useful transparency signal even though it is not the same as an independent benchmark audit.[4]

The production meaning is less about Olympiad prestige than about how Ant is positioning "thinking." If a reasoning model consumes large volumes of internal text, scratch work, tool calls, and summarization, then decoding throughput becomes part of the capability claim. A model that reasons well but burns too much time or memory can win a static benchmark and still lose the workload. Ring's model card is trying to argue that architecture, RL, and benchmark depth have to be evaluated together.[2]

That puts Ant into a distinct lane within AI-China. DeepSeek made sparse-attention and reasoning economics a mainstream topic. Kimi has pushed long context and agent swarm language. Qwen has turned open weights and model-platform distribution into a very broad developer surface. Ant's Ling/Ring split enters the same conversation with a fintech-platform bias: fewer wasted tokens, long context that can carry documents, and agent compatibility that can become useful in professional workflows.[1][2][3]

What to benchmark next

For a team evaluating Ling-2.5-1T or Ring-2.5-1T, the next benchmark should be workload-shaped rather than leaderboard-shaped.

First, test answer quality per generated token. The relevant comparison is not only final score; it is whether Ling can close common tasks with fewer output tokens than a thinking model, while still preserving instruction following and source-grounded behavior. This is where document review, customer-service drafting, compliance memo extraction, and internal knowledge-base Q&A become better tests than abstract chat prompts.[1]

Second, test context position and retrieval stress. Ling's 1M-token context claim is useful only if the model keeps useful accuracy across position, distraction density, and mixed-document formats. The model card cites NIAH, RULER, and MRCR-style long-context evaluations, while also acknowledging gaps against leading closed API models.[1] Teams should reproduce that boundary with their own documents: contracts, financial disclosures, support histories, medical policies, or codebases.

Third, test agent handoff rather than one-step tool use. Ling says it was trained with agentic RL in high-fidelity interactive environments and is compatible with Claude Code, OpenCode, and OpenClaw; Ring says it adapts to agentic programming frameworks and personal AI assistants.[1][2] Those claims deserve tests with failure recovery, not only happy-path function calls. A useful eval should include tool errors, stale files, ambiguous instructions, and a forced checkpoint where the model has to revise its plan.

Fourth, test serving footprint honestly. A 1T model, even with 63B active parameters and hybrid linear attention, is not a casual local-deployment object. The Ling card's SGLang example is multi-node and explicitly says the command should be adjusted to the user's environment.[1] That makes provider availability, quantization, batch size, and hardware lane part of the benchmark. The open license matters, but openness does not remove deployment physics.

The AI-China signal

The larger signal is that Chinese open-model competition is becoming more specialized. The interesting move is not "another trillion-parameter model." It is the separation of a fast, long-context instant lane from a deeper thinking lane, both framed around hybrid linear attention and agent workload economics.[1][2]

For Ant, that separation fits the company's likely application surface. Payments, financial services, healthcare interfaces, merchant operations, risk review, and document-heavy workflows all reward models that can read long context without turning every request into an expensive reasoning marathon. A fast model that handles a high share of routine work, plus a thinking model that escalates hard cases, is a more plausible platform pattern than one universal model trying to serve every latency and cost tier.

The caveat is evidence maturity. Ling and Ring's strongest numbers are currently model-card claims from the model publisher. That is normal for a release cycle, but it means the next confidence step has to come from reproducible third-party evals, provider benchmarks, and user workload traces. Until then, the most defensible conclusion is narrower: Ant has put token efficiency, hybrid linear attention, and long-context serving economics at the center of its open-model story, and that is a real directional signal for AI-China in 2026Q2.[1][2][3]

cronfeed.work

Ant's Ling 2.5 says the China open-model race is becoming a token-efficiency race

The headline is not only 1T

Ring moves the eval toward decoding economics

What to benchmark next

The AI-China signal

Sources

Recommended In ai china