The AI landscape has just been shaken with the official release of the Kimi K2 model by Moon's Dark Side Corporation, accompanied by the announcement of its open-source availability. This groundbreaking model, based on the MoE architecture, has quickly garnered attention for its robust coding abilities and exceptional performance in general Agent tasks.
Boasting an impressive 1 trillion parameters and 32 billion activation parameters, the Kimi K2 has achieved top-tier results in benchmark performance tests such as SWE Bench Verified, Tau2, and AceBench, showcasing its prowess in code writing, Agent task execution, and mathematical reasoning.
During the pre-training phase, the Kimi K2 employed the innovative MuonClip optimizer, which effectively addressed the issue of large attention logits in massive training, elevating training stability and token usage efficiency to unprecedented levels. The Moon's Dark Side team successfully completed a smooth training of 15.5 trillion tokens without any loss spikes, offering a fresh approach to the stable and efficient training of trillion-parameter models.
Beyond its exceptional performance in benchmark tests, the Kimi K2 has also demonstrated formidable generalization and practicality in real-world applications. In terms of coding capabilities, the Kimi K2 not only generates front-end code that combines design and visual appeal, supporting complex representations such as particle systems, visualization, and 3D scenes, but it can also autonomously construct a complete futures trading interface without specific instructions, showcasing its powerful autonomous programming capabilities.
The Kimi K2 also excels in Agent tool invocation. It can stably parse complex instructions, automatically breaking down requirements into a series of well-formatted, executable ToolCall structures, seamlessly integrating with various Agent/Coding frameworks to complete complex tasks or automate coding. Whether analyzing the impact of remote work ratios on salaries or planning and executing a fan chase plan for Coldplay enthusiasts, the Kimi K2 handles these tasks with ease, demonstrating its robust Agent capabilities.
In addition, the Kimi K2 has made significant strides in stylized writing. Whether rewriting scientific texts in a middle school tone or imitating Apple's advertising copy, the Kimi K2 can accurately control the output style while preserving the original meaning and expression. In fictional writing tasks, the Kimi K2 generates text that pays more attention to details and emotions, providing users with a richer creative experience.
Moon's Dark Side Corporation has not only released the Kimi K2 model but also simultaneously open-sourced two model versions: Kimi-K2-Base and Kimi-K2-Instruct. Kimi-K2-Base is a basic pre-trained model without instruction fine-tuning, suitable for research and custom scenarios, while Kimi-K2-Instruct is a general instruction fine-tuned version that excels in most question-answering and Agent tasks. The models and fp8 weight files have been open-sourced on the HuggingFace platform for developers to freely use.
To facilitate developer deployment and use, inference engines such as vLLM, SGLang, and ktransformers have also been synchronized to support the Kimi K2 model. Developers can deploy on their own servers to obtain the same experience as the Kimi Open Platform API.
In terms of API services, Kimi K2 also offers comprehensive support. Its API service has been fully launched, supporting up to 128K context, with stronger generality and tool invocation capabilities. The billing plan is flexible and reasonable, with only 4 yuan for every million input tokens and 16 yuan for every million output tokens. It also supports both OpenAI and Anthropic API formats, making it easy for developers to switch seamlessly.