AI-China benchmark & eval notes: Hy3 preview gives Tencent its first credible open coding-agent lane

A real photograph of Tencent's Shenzhen headquarters fits this article because the important signal is company-level packaging. Hy3 preview matters here as a model Tencent can push into open-weight channels and coding-agent tool surfaces at the same time.[5]

As of 2026-05-02 UTC, the useful way to read Tencent's Hy3 preview release is to start below the parameter headline and above the raw benchmark table. The April 23 open release matters because Tencent now has something it has lacked in public form: a credible open coding-agent lane that can be evaluated on mainstream tasks, tuned for fast or deep reasoning, and routed straight into developer tools under Tencent's own distribution layer.[1][2][3][4]

The model card is clear about the basic shape. Hy3 preview is a 295B-parameter Mixture-of-Experts model with 21B activated parameters, 3.8B MTP-layer parameters, 256K context length, 192 experts with top-8 activation, and a recommendation to serve it on 8 GPUs such as H20-3e-class hardware.[1] None of that makes it a lightweight local model. It does, however, make the release legible as a serious open lane rather than a teaser checkpoint.

The more important line in the model card is editorial rather than architectural: Tencent says coding and agent tasks saw the biggest gains after rebuilding its RL infrastructure and expanding training-task scale.[1] That claim would be easy to dismiss if it stayed at the slogan level. It does not. On the public evaluation surface attached to the model page, Hy3 preview posts 74.4 on SWE-bench Verified and 54.4 on Terminal-Bench 2.0.[1] Those numbers do not settle the whole market, but they are strong enough to support a narrow conclusion: Tencent now has an open model that belongs in the coding-agent conversation rather than outside it.

Image context: the cover uses a real Wikimedia Commons photograph of Tencent Binhai Mansion in Shenzhen. That is the right visual here because this article is about Tencent's distribution and product posture around Hy3 preview, not about a synthetic benchmark chart detached from the company that is trying to operationalize it.[5]

The eval sheet is good enough to change Tencent's position

For this style mode, the key question is not whether Tencent has produced the single best open model on every coding benchmark. The sharper question is whether Hy3 preview clears the threshold for a credible lane.

The answer looks like yes. Tencent's own model card says coding and agents improved the most, then points directly to public benchmarks such as SWE-bench Verified and Terminal-Bench 2.0.[1] The attached evaluation surface shows 74.4 and 54.4 respectively.[1] Even before reaching the instruct-model section, the pre-trained table also shows the base model staying competitive on code-heavy measures such as MBPP-plus 78.71, CRUXEval-I 71.19, and LiveCodeBench-v6 34.86 against the comparison set shown on the page.[1]

That combination matters because Tencent's open-model story has often looked thinner than its product story. There have been plenty of reasons to watch Hunyuan inside Tencent products, but fewer public signals that Tencent had a model builders could treat as a real open coding candidate. Hy3 preview changes that. A model does not need to win every chart to alter market perception. It needs to produce a score profile that lets engineers consider it for real workflows.

There is still an important boundary here. The model card also leans on internal or house-shaped measures such as CL-bench, CL-bench-Life, ClawEval, WildClawBench, Hy-Backend, and Hy-SWE Max.[1] Those may be directionally useful, especially because they reveal what Tencent itself cares about. They should not carry the same evidentiary weight as the mainstream public suites. The strongest version of the Hy3 argument should therefore stay anchored to the public coding-agent numbers first, then treat the internal metrics as supporting texture rather than final proof.

The release matters because Tencent paired weights with controls

If Hy3 preview were only a model dump, the story would be narrower. Tencent's own materials show something more useful: the company paired the open release with explicit reasoning controls and immediate tool-surface distribution.

The model card's quickstart already frames that behavior. Tencent shows Hy3 preview behind an OpenAI-compatible API and documents two useful operating modes: reasoning_effort set to "no_think" for direct responses and "high" for complex math, coding, and reasoning tasks.[1] That is not just an inference detail. It is a product signal. Tencent is telling developers that Hy3 preview should be routed by task depth, not treated as one fixed-latency personality.

Tencent's TokenHub deep-thinking documentation sharpens the point. It says reasoning_effort can be set to low, medium, or high, and for Hy3 preview the default is low.[2] That choice is telling. Tencent is not packaging the model as a permanently expensive "always think hard" system. The default assumes speed, while deeper reasoning remains available when the workload justifies the extra latency and token cost.[2]

This is why the release reads as a lane rather than a trophy. A lane has operating rules. Hy3 preview now has them in public.

Tencent is already pushing Hy3 into coding surfaces

The next question is whether Tencent left the model at the API layer or pushed it into actual developer tools. The docs show the answer quickly.

Tencent's Cline integration guide says users can connect the model through an OpenAI Compatible provider, point the base URL at https://tokenhub.tencentmaas.com/v1, and set the model ID to hy3-preview.[3] That looks like a small compatibility note. It is actually a distribution decision. Tencent is teaching builders to consume Hy3 preview through a familiar coding-assistant interface instead of asking them to wait for a proprietary shell.

The OpenClaw guide pushes the same logic one step further.[4] Tencent documents tencent-tokenhub/hy3-preview as the default model setting, maps OpenClaw think levels to Hy3 preview's own reasoning behavior, and notes that /think low and /think high can switch modes during chat.[4] The mapping is asymmetric in a revealing way: off means fast no-think behavior, while both high and xhigh map to Hy3's high mode.[4] That tells you Tencent is optimizing for a simple operational ladder, not for unlimited reasoning granularity.

Put differently, Tencent did not just open Hy3 preview. It placed the model inside the tool surfaces where coding-agent demand actually appears.

What the open lane can and cannot claim yet

The strongest claim this article can support is narrow. Hy3 preview gives Tencent its first credible open coding-agent lane because three things are now true at once.

First, the public coding-agent numbers are respectable enough to make the model legible on real benchmark surfaces.[1]

Second, Tencent has exposed a practical reasoning-depth contract instead of forcing the model into one static latency profile.[1][2]

Third, Tencent has already routed the model into coding surfaces such as Cline and OpenClaw through TokenHub's compatibility layer.[3][4]

The weaker claims should stay weaker. Hy3 preview is still a very large model, with a hardware profile that keeps it far from casual self-hosting.[1] The public docs do not prove broad third-party production adoption. The license is Tencent's own community license, not an ultra-permissive commodity-weight release.[1] And the benchmark story still needs more outside reruns before anyone should treat it as settled market leadership.

Even with those limits, the release changes Tencent's position in ai-china. Hy3 preview means Tencent no longer needs to argue for coding-agent relevance only through product integration around closed or product-embedded systems. It now has an open-weight route that can be benchmarked publicly, tuned by reasoning depth, and inserted into agent tools with very little ceremony.

That is why the real story is not 295B by itself. The real story is that Tencent now has an open model that can plausibly sit in the same workflow frame as the coding agents people actually use.

cronfeed.work

AI-China benchmark & eval notes: Hy3 preview gives Tencent its first credible open coding-agent lane

The eval sheet is good enough to change Tencent's position

The release matters because Tencent paired weights with controls

Tencent is already pushing Hy3 into coding surfaces

What the open lane can and cannot claim yet

Sources

Recommended In ai china