Meet Memory OS: A 6-Layer Memory Stack Built on Hermes Agent

Hermes Agent already remembers from every session. An open source agent from Nous Research ships with selected memory files and full-text session search. But a new community project says the built-in memory is too shallow to handle heavy workloads. A new library called ‘Memory OS’ has been released under the MIT license by the developer (ClaudioDrews). It puts six layers of memory into Hermes. It adds a vector database, structured facts, and an auto-selected information wiki. The project is new but seems to have good potential and its design shows how the agent’s memory can be installed.

Memory OS

Memory OS is not a Hermes plugin that unlocks it. It is a layered system that sits alongside the memory of Hermes Agent. Hermes already provides workspace files and a session database. Memory OS keeps those and adds four layers on top of them. The full stack runs locally using Docker, Qdrant, Redis, and Python 3.11+. Works with any LLM provider Hermes supports, including OpenRouter, OpenAI, Anthropic, and Ollama. The README lists it as a “memory operating system,” not a single feature.

Six Layers, From Files to Vectors

Layer 1 is the workspace. It holds MEMORY.md, USER.md, and CREATIVE.md, which are injected into the system during each turn.
Layer 2 is Session. It uses state.db, a SQLite database with FTS5 full-text search of all chat history.
Layer 3 is Structured facts. Stores hard facts in memory_store.db, uses SQLite, HRR, FTS5, and trust scores. A feedback loop adjusts those trust scores over time, in line with business adjustments.
Layer 4 is Fabric, a heavily forked version of the Icarus Plugin. This fork adds LLM-enabled session extraction on top of the upstream esaradev/icarus-plugin. It handles cross-session recall using 16 tools, including cloth_recall, cloth_write, and cloth_brief.
Layer 5 is the Vector Database, built on Qdrant. It uses 4096d Cosine vectors and BM25 sparse search, a keyword scaling method.
Layer 6 is the LLM Wiki, an automatically curated vault of ideas, businesses, and comparisons. That wiki is also ingested into Qdrant through a process called wiki-continuous-ingest.

How the Flow of Retrieval Works

The flow is constant when memory is read and written. Opened pre_llm_callMemory OS uses what it calls memory operations. It draws from four sources at once: Fabric, Qdrant, Sessions, and Facts. Each source is capped by an eligibility threshold before anything reaches the model. Each session deduction stops the same context from appearing twice. A social-friendly filter skips trivial messages, such as a vague “thank you.” Opened post_llm_call again on_session_endthe system outputs and captures new readings automatically. The stated goal is the efficiency of the tokens, not to focus the context window.

Fallback Cascade and Cleanup

Layer 5 retrieval uses a four-level backcross. It tries hybrid search first, then density vectors, then lexical, then SQLite. If one method fails or returns nothing, the next one takes over. This design keeps memory running even if the vector database is struggling. Memory OS also runs a weekly decay scan to age out old entries. Semantic dedup combines the most similar memories when the cosine similarity exceeds 0.92. These housekeeping measures aim to stop memory from exploding over months of use.

Location-First, and Intentionally So

Memory OS stacks up against cloud memory services like mem0, Zep, and Letta. The bottom line is that the memory infrastructure has to work with your machine. Memory data remains local, regardless of memory registration. LLM calls still go to whichever provider you choose. Hermes itself already supports eight external memory providers, including mem0 and Honcho. Memory OS is not one of those official providers. It’s a unique, community-built stack placed directly on Hermes. For teams with data residency rules, a local memory store can be important.

Powers and Limitations

Power:

Clear layered design that separates files, sessions, facts, vectors, and wiki
Fully local infrastructure with no cloud storage subscription
Provider-agnostic, matching the flexibility of Hermes Agent
Retrieving tokens properly by design, with gated sources and per-session mitigation

Limitations:

Brand new, few commitments
A forked Icarus plugin that the author says is incompatible with the stream
Solid setup: Docker, Qdrant, Redis, and ARQ Worker are all required
There are no published benchmarks for memory quality, latency, or token savings

Key Takeaways

Memory OS is a community-built, MIT-licensed stack that adds six layers of memory on top of the Hermes Agent.
It includes workspace files, FTS5 session search, facts with trust points, forked Icarus fabric, Qdrant Vectors, and an auto-selected LLM wiki.
The retrieval is in progress pre_llm_call through the gate, repeated recall from four sources; filming continues post_llm_call again on_session_end.
The memory infrastructure is fully local and provider agnostic, but LLM calls still go to your chosen provider.

Check it out Repo. Also, feel free to follow us Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.?contact us

The post Meet Memory OS: A 6-Layer Memory Stack Built on Top of Hermes Agent appeared first on MarkTechPost.