NVIDIA BioNeMo Agent Toolkit Turns Biomolecular Models into Powerful Skills for AI Agents in Drug Discovery

AI scientists are becoming the new interface for scientific computing. These agents read documents, write code, generate hypotheses, call APIs, and inspect files. But science is not software engineering. No test suite becomes green if the hypothesis is correct. Discovery is always iterative, uncertain, and based in the physical world.

That gap is what NVIDIA is targeting. NVIDIA has published a walkthrough for its BioNeMo Agent Toolkit. The argument is straightforward. A conventional biologically targeted coding agent will not produce new drugs. In biomolecular research, the ceiling for an agent is set by the tools they can use reliably, efficiently, and effectively.

The TL;DR

The BioNeMo agent toolkit packages NVIDIA biomolecular models as scriptable, callable capabilities.
Skills include protein folding, docking, generative chemistry, genomics, and protein structure.
NVIDIA reports task completion increases from 57.1% to 100% with capabilities.
Agents averaged 2x more pass assertions for 1,000 tokens.
Managed NIM endpoints allow faster access; A local NIM allows multiple iterations.

Interactive Descriptor

The BioNeMo Agent Toolkit is an open source repository of ‘skills’ for AI agents. Each skill transforms an NVIDIA biomolecular model into a tool that the agent can drive. The toolkit packs protein folding, molecular docking, generative chemistry, genomics analysis, protein design, and biomarker discovery.

NVIDIA makes the platform in two parts. The first is an accelerated tool layer. NVIDIA NIM (NVIDIA Inference Microservices) and BioNeMo open models deliver valuable capabilities as callable services. This is accelerated by libraries such as cuEquivariance for structural models and Parabricks for genomics. The second part is agent-friendly links. BioNeMo Skills passes each skill so that the agent can use it.

The capability lists the purpose of the model, required inputs, optional parameters, expected artifacts, and failure modes. Model Context Protocol (MCP) server wrappers expose open models that can be packaged as NIM. Together, this allows the agent to discover, select, request, and interpret biomolecular models on its own.

The repository includes skills nim-skills, open-models-skillsagain library-skills. A workflows folder holds multi-step meta-skills. Another example is this generative_protein_binder_designwhich includes RFdiffusion → ProteinMPNN → OpenFold3.

How BioNeMo Skill Works

Every skill is indexed with a SKILL.md file. Contains YAML frontmatter and commands, optional references, and optional scripts. The agent reads them as documents, then acts on them.

The information pattern remains the same for all models. NVIDIA posts using OpenFold3. A similar trend applies to other biological NIMs. These include Boltz-2, DiffDock, GenMol, ProteinMPNN, MSA Search, RFdiffusion, and Evo 2. You specify the skill, input, and storage location.

# Hosted NIM endpoint
Use the OpenFold3 BioNeMo Skill to fold MKTVRQERLKSIVR
with the NVIDIA API endpoint at 

# Local NIM deployment
Use the OpenFold3 BioNeMo Skill to fold MKTVRQERLKSIVR
with the local NIM endpoint at

Installation draws on open source capabilities skills CLI:

# Browse and pick a skill interactively
npx skills add NVIDIA-BioNeMo/bionemo-agent-toolkit

# Or install one skill for a specific agent
npx skills add NVIDIA-BioNeMo/bionemo-agent-toolkit --skill boltz2-nim --agent claude-code

Deployment is optional, not automatic. Use hosted NIM endpoints for faster access without infrastructure management. Move selected models to an environment where you need low latency warm up, data center, or multiple iterations.

Benchmark

NVIDIA has estimated that the capabilities actually improve the agent loop. All metrics reported are from the Codex CLI running GPT-5.5 faster. The team compared the same agent with and without each skill.

Task completion was the first metric. Excluding skills, the agent completed 57.1% of required tasks on average. With access to NIM capabilities, completion has reached 100%.

Efficiency was the second metric. NVIDIA has calculated the assertions that pass, the individual steps that comprise the function. With skills, the agent generated 2x more assertions for 1,000 tokens. That advantage held for all ten NIM skills tested.

Use Cases with examples

Protein structure prediction: The agent folds the peptide sequence with Boltz-2 or OpenFold3. Returns a CIF file for downstream testing.
Multiple sequence alignment: Agent generates MSA with MMseqs2 with MSA Search capability. The artifact is an A3M file.
Generative chemistry: Agent generates candidate molecules with GenMol. The output comes as SDF or SMILES for sorting.
Protein synthesis design: I generative_protein_binder_design three model workflow chains. RFdiffusion builds the backbone, ProteinMPNN designs the sequence, and OpenFold3 ensures folding.
Each loop follows the same pattern: The agent selects the model, configures the inputs, executes it, evaluates the output, and interprets the results with warnings.

How It Compares: Agent With vs Without Skills

Size	Standard agent (no skills)	Agent + BioNeMo Skills
Completion of work	57.1% average	100% average
Efficiency of tokens	The foundation	2x claim pass for 1k tokens
Model selection	Projection tool, format, and input	You learn intent, input, and artifacts
Shipping	Manual setup from source	Managed or local NIM, written
Failure to manage	Unknown failure modes	Documented failure modes for each skill
Work flow	Single calls	Multiple steps for meta-skills (integrated design)

Getting started

The requirements are minimal. You need an agent runtime like Claude or Codex. You need an NVIDIA API key for hosted BioNeMo NIM endpoints. The GPU node is optional, with local use of NIM.

Point the agent to the endpoint first. Let it calculate the available skills before it does. Then you give it one ability to use one model.

NVIDIA flags two caveats. I build.nvidia.com storage areas are for development and testing of minors only. It is not a production grade specification. NVIDIA also emphasizes validation: check low-dependency structures and filter generated molecules before confidence.

Check it out Repo again Technical details. Also, feel free to follow us Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? contact us