Google’s Open-Source Multimodal AI Explained

Google’s Open-Source Multimodal AI Explained

On June 3, 2026, Google introduced Gemma 4 12B Unified, an open source multimodal model designed to understand text, images, audio, and video within a single architecture. It includes a 256K window content with an efficient, laptop-friendly design intended for agent workflow and on-premises use. The release also raises interesting questions about Google’s broader AI … Read more

Miso Labs Releases MisoTTS: An 8B Dynamic Model for Open-Weighted Text-to-Speech

Miso Labs Releases MisoTTS: An 8B Dynamic Model for Open-Weighted Text-to-Speech

Miso Labs released MisoTTS, an open-source 8-billion-parameter text-to-speech model. Produces expressive speech in both text and audio contexts. The model uses residual vector quantization (RVQ) to extend its sonic range. This avoids scaling a single flat vocabulary while keeping the parameter count constant. What is MisoTTS MisoTTS is an 8B parameter for the text-to-dialogue RVQ … Read more

Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native Sound Using a 16 GB Laptop

Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native Sound Using a 16 GB Laptop

Google DeepMind recently released Gemma 4 12B, a dense multimodal model that completely strips traditional encoders. Vision and sound flow straight to the core of the LLM. The result is a model that runs an agent workflow on a consumer laptop with 16 GB of RAM. It is distributed under the Apache 2.0 license. Model … Read more

Nous Research Releases Hermes Desktop: Native Cross-Platform Front End for Hermes Agent v0.15.2 with Streaming Tool

Nous Research Releases Hermes Desktop: Native Cross-Platform Front End for Hermes Agent v0.15.2 with Streaming Tool

Nous Research released it Hermes Desktop in public preview. It is a native operating system for macOS, Windows, and Linux. It provides the open source Hermes Agent with a user interface. So far, users have run Hermes through the CLI and messaging gateways. The current build is Hermes Agent v0.15.2. According to Nous Research documentation, … Read more

NVIDIA Releases Cosmos 3: Base Model of Two Transformers Towers Including Physical Simulation, World Generation, and Action Generation

NVIDIA Releases Cosmos 3: Base Model of Two Transformers Towers Including Physical Simulation, World Generation, and Action Generation

The NVIDIA AI team has been released Cosmos 3. It is a family of omnimodal world models for body AI. Models include physical reasoning, world generation, and action. All three capabilities live within one open model. NVIDIA has open sourced benchmarks, training documentation, deployment tools, and datasets. I Cosmos 3 targeted rollout of robots, autonomous … Read more

How to Fine-Tune LFM2 Using QLoRA and DPO: A Complete Step-by-Step Copy Tutorial on Google Colab

How to Fine-Tune LFM2 Using QLoRA and DPO: A Complete Step-by-Step Copy Tutorial on Google Colab

In this lesson, we are fine tuning Liquid AI’s LFM2 model by using a complete open source workflow. We start by loading the basic LFM2 testbed with QLoRA, prepare a dialog-style supervised fine-tuning data set, train a lightweight LoRA adapter using TRL and PEFT, and integrate the adapter back into the model. We also extend … Read more

How to use Claude Managed Agents?

How to use Claude Managed Agents?

If you’ve ever tried to deploy an AI agent in production, you know that the hard part is usually not the model. It’s all around: sandboxing, state administration, authentication management, tool execution, error detectionand all the infrastructure that turns a prototype into something reliable. Anthropic’s Claude Managed Agents makes that easy by providing you with … Read more

Alibaba’s Qwen Team Introduces Qwen3.7-Plus, Adds Vision, Deep Reasoning, Tool Persuasion, and Autonomous Iteration to the Bailian Platform

Alibaba’s Qwen Team Introduces Qwen3.7-Plus, Adds Vision, Deep Reasoning, Tool Persuasion, and Autonomous Iteration to the Bailian Platform

Alibaba’s Qwen team has released Qwen3.7-Plus. The model is now available through Bailian’s Alibaba Cloud platform. Bailian is a console that international users access as Model Studio. It provides API services to external developers. The release follows Alibaba’s launch in May of the Qwen3.7 generation. Qwen3.7-Plus Qwen3.7-Plus is a large multi-language model. The model understands … Read more