DeepReinforce Releases Ornith-1.0: An Open Source Model Family That Learns Its Own RL Scarves





DeepReinforce released Ornith-1.0an open source model family built for agent coding. The range includes four sizes, from the compact 9B model to the 397B hybrid-professional standout. All testing sites are licensed under the MIT Hugging Face license. The models were post-trained over the pre-trained Gemma 4 and Qwen 3.5.

Most coding agents pair a model with a fixed, custom-designed harness. Ornith-1.0 instead learns to write his own. The DeepReinforce research team reports state-of-the-art results among open models of the same size.

The TL;DR

  • Ornith-1.0 ships in sizes 9B, 31B, 35B-MoE, and 397B-MoE under MIT, built on Gemma 4 and Qwen 3.5.
  • The model learns its scaffolding during RL, co-optimizing the harness and solution.
  • The Ornith-1.0-397B tops the Claude Opus 4.7 in both benchmarks, but not the Opus 4.8 or the larger GLM-5.2-744B.
  • Three layers – fixed trust boundary, deterministic monitoring, frozen LLM judge – prevent reward hacking.

What is Ornith-1.0?

Ornith-1.0 is a set of reasoning models tuned by coding agents. The exceptions are 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The 35B model is a mix-of-artists and activates about 3B parameters per token. FP8 and GGUF builds are also published for immediate local deployment.

Each model is a conceptual model. Answers are opened with a block before the last answer. The feed recipes enable the analyzer to think, so that the trace returns separately reasoning_content field. Models also issue well-formed tool calls for agent loops.

Shipping is straightforward. The 9B model is about 19GB in bf16 and runs on a single 80GB GPU. Providing recipes for vLLM, SGlang, and Transformers. Each model presents an OpenAI compatible endpoint. So standard agent frameworks work without any code changes.

Interactive Descriptor