Gate News message, April 29 — Ant Group’s Ling-2.6-flash model weights are now open-sourced, having previously been available only via API. The model features 104 billion total parameters with 7.4 billion activated per inference, a 256K context window, and MIT licensing. BF16, FP8, and INT4 precision versions are available on HuggingFace and ModelScope.
Ling-2.6-flash introduces hybrid linear attention improvements over Ling 2.0, upgrading the original GQA to a 1:7 MLA plus Lightning Linear hybrid architecture combined with highly sparse MoE. Inference efficiency significantly exceeds comparable models: peak generation speed reaches 340 tokens/s on 4x H20 GPUs, with prefill and decode throughput approximately 4x higher than comparable open-source models. Agent-related benchmarks show strong performance: BFCL-V4, TAU2-bench, SWE-bench Verified (61.2%), Claw-Eval, and PinchBench achieve or approach SOTA levels. Across the full Artificial Analysis benchmark suite, total token consumption is only 15 million. On AIME 2026, the model scored 73.85%.
Ant Group’s official website also lists Ling-2.6-1T (trillion-parameter flagship version) and Ling-2.6-mini (lightweight version), though as of publication, their weights remain unreleased on HuggingFace, with only the flash series available for download.
関連記事