Model Gallery

Discover and install AI models from our curated collection

27 models available

1 repositories

Documentation

Find Your Perfect Model

Filter by Model Type

Browse by Tags

nemotron-3-nano-omni-30b-a3b-reasoning-apex

# Model Overview ### Description: NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows. It extends the Nemotron Nano family with integrated video+speech comprehension, Graphical User Interface (GUI), Optical Character Recognition (OCR), and speech transcription capabilities, enabling end-to-end processing of rich enterprise content such as meeting recordings, M&E assets, training videos, and complex business documents. NVIDIA Nemotron 3 Nano Omni was developed by NVIDIA as part of the Nemotron model family. This model is available for commercial use. This model was improved using Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, Qwen3.5-397B-A17B, Qwen2.5-VL-72B-Instruct, and gpt-oss-120b. For more information, please see the Training Dataset section below. ### License/Terms of Use Governing Terms: Use of this model is governed by the NVIDIA Open Model Agreement ### Deployment Geography: Global ...

Repository: localaiLicense: other

voxtral-mini-4b-realtime

Voxtral Mini 4B Realtime is a speech-to-text model from Mistral AI. It is a 4B parameter model optimized for fast, accurate audio transcription with low latency, making it ideal for real-time applications. The model uses the Voxtral architecture for efficient audio processing.

Repository: localaiLicense: apache-2.0

moonshine-tiny

Moonshine Tiny is a lightweight speech-to-text model optimized for fast transcription. It is designed for efficient on-device ASR with high accuracy relative to its size.

Repository: localaiLicense: apache-2.0

whisperx-tiny

WhisperX Tiny is a fast and accurate speech recognition model with speaker diarization capabilities. Built on OpenAI's Whisper with additional features for alignment and speaker segmentation.

Repository: localaiLicense: mit

streaming-zipformer-en-sherpa

Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.

Repository: localaiLicense: apache-2.0

liquidai.lfm2-2.6b-transcript

This is a large language model (2.6B parameters) designed for text-generation tasks. It is a quantized version of the original model `LiquidAI/LFM2-2.6B-Transcript`, optimized for efficiency while retaining strong performance. The model is built on the foundation of the base model, with additional optimizations for deployment and use cases like transcription or language modeling. It is trained on large-scale text data and supports multiple languages.

Repository: localai

gemma-4-e2b-it:sglang-mtp

Google Gemma 4 E2B-IT served by SGLang with Multi-Token Prediction (MTP) speculative decoding. The companion drafter google/gemma-4-E2B-it-assistant lets the target accept several tokens per step. Flags are a 1:1 transcription of the SGLang cookbook's MTP command (NEXTN algorithm, num_steps=5, num_draft_tokens=6, eagle_topk=1, mem_fraction_static=0.85). The E2B variant has 5B total / 2B effective parameters and targets the smaller end of consumer GPUs.

Repository: localaiLicense: gemma

gemma-4-e4b-it:sglang-mtp

Google Gemma 4 E4B-IT served by SGLang with Multi-Token Prediction (MTP) speculative decoding. The companion drafter google/gemma-4-E4B-it-assistant lets the target accept several tokens per step. Flags are a 1:1 transcription of the SGLang cookbook's MTP command (NEXTN algorithm, num_steps=5, num_draft_tokens=6, eagle_topk=1, mem_fraction_static=0.85). The E4B variant has 8B total / 4B effective parameters — the natural pick for consumer GPUs in the 16–24 GB range.

Repository: localaiLicense: gemma

whisper-base-q5_1

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-base

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-base-en-q5_1

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-base-en

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-large-q5_0

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-medium

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-medium-q5_0

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-small-q5_1

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-small

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-small-en-q5_1

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-small-en

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-small-q5_1

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

whisper-tiny-en-q5_1

Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

Page 1 of many