Model Gallery

Discover and install AI models from our curated collection

105 models available
1 repositories
Documentation

Find Your Perfect Model

Filter by Model Type

Browse by Tags

qwen_qwen3.5-4b
Describe the model in a clear and concise way that can be shared in a model gallery.

Repository: localai

qwen3.5-4b-claude-4.6-opus-reasoning-distilled
Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-GGUF - A GGUF quantized model optimized for local inference. Specialized for reasoning and chain-of-thought tasks. Based on Qwen 3.5 architecture with enhanced language understanding. Available in multiple quantization levels for various hardware requirements. Distilled from Claude-style reasoning models for enhanced logical reasoning capabilities.

Repository: localaiLicense: gpl-3.0

voxtral-mini-4b-realtime
Voxtral Mini 4B Realtime is a speech-to-text model from Mistral AI. It is a 4B parameter model optimized for fast, accurate audio transcription with low latency, making it ideal for real-time applications. The model uses the Voxtral architecture for efficient audio processing.

Repository: localaiLicense: apache-2.0

qwen3-vl-4b-instruct
Qwen3-VL-4B-Instruct is the 4B parameter model of the Qwen3-VL series.

Repository: localaiLicense: apache-2.0

qwen3-vl-4b-thinking
Qwen3-VL-4B-Thinking is the 4B parameter model of the Qwen3-VL series that is thinking.

Repository: localaiLicense: apache-2.0

opengvlab_internvl3_5-14b-q8_0
We introduce InternVL3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0% gain in overall reasoning performance and a 4.05 ×\times× inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks—narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released.

Repository: localaiLicense: apache-2.0

opengvlab_internvl3_5-14b
We introduce InternVL3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0% gain in overall reasoning performance and a 4.05 ×\times× inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks—narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released.

Repository: localaiLicense: apache-2.0

opengvlab_internvl3_5-4b
We introduce InternVL3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0% gain in overall reasoning performance and a 4.05 ×\times× inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks—narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released.

Repository: localaiLicense: apache-2.0

opengvlab_internvl3_5-4b-q8_0
We introduce InternVL3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0% gain in overall reasoning performance and a 4.05 ×\times× inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks—narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released.

Repository: localaiLicense: apache-2.0

qwen3-14b
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-14B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 14.8B Number of Paramaters (Non-Embedding): 13.2B Number of Layers: 40 Number of Attention Heads (GQA): 40 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Repository: localaiLicense: apache-2.0

qwen3-4b
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-4B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 4.0B Number of Paramaters (Non-Embedding): 3.6B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN.

Repository: localaiLicense: apache-2.0

mlabonne_qwen3-14b-abliterated
Qwen3-14B-abliterated is a 14B parameter model that is abliterated.

Repository: localaiLicense: apache-2.0

mlabonne_qwen3-4b-abliterated
Qwen3-4B-abliterated is a 4B parameter model that is abliterated.

Repository: localaiLicense: apache-2.0

qwen3-8b-jailbroken
This jailbroken LLM is released strictly for academic research purposes in AI safety and model alignment studies. The author bears no responsibility for any misuse or harm resulting from the deployment of this model. Users must comply with all applicable laws and ethical guidelines when conducting research. A jailbroken Qwen3-8B model using weight orthogonalization[1]. Implementation script: https://gist.github.com/cooperleong00/14d9304ba0a4b8dba91b60a873752d25 [1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).

Repository: localaiLicense: apache-2.0

fast-math-qwen3-14b
By applying SFT and GRPO on difficult math problems, we enhanced the performance of DeepSeek-R1-Distill-Qwen-14B and developed Fast-Math-R1-14B, which achieves approx. 30% faster inference on average, while maintaining accuracy. In addition, we trained and open-sourced Fast-Math-Qwen3-14B, an efficiency-optimized version of Qwen3-14B`, following the same approach. Compared to Qwen3-14B, this model enables approx. 65% faster inference on average, with minimal loss in performance. Technical details can be found in our github repository. Note: This model likely inherits the ability to perform inference in TIR mode from the original model. However, all of our experiments were conducted in CoT mode, and its performance in TIR mode has not been evaluated.

Repository: localaiLicense: apache-2.0

amoral-qwen3-14b
Core Function: Produces analytically neutral responses to sensitive queries Maintains factual integrity on controversial subjects Avoids value-judgment phrasing patterns No inherent moral framing ("evil slop" reduction) Emotionally neutral tone enforcement Epistemic humility protocols (avoids "thrilling", "wonderful", etc.)

Repository: localaiLicense: apache-2.0

huihui-ai_qwen3-14b-abliterated
This is an uncensored version of Qwen/Qwen3-14B created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens. Ablation was performed using a new and faster method, which yields better results.

Repository: localaiLicense: apache-2.0

claria-14b
Claria 14b is a lightweight, mobile-compatible language model fine-tuned for psychological and psychiatric support contexts. Built on Qwen-3 (14b), Claria is designed as an experimental foundation for therapeutic dialogue modeling, student simulation training, and the future of personalized mental health AI augmentation. This model does not aim to replace professional care. It exists to amplify reflective thinking, model therapeutic language flow, and support research into emotionally aware AI. Claria is the first whisper in a larger project—a proof-of-concept with roots in recursion, responsibility, and renewal.

Repository: localaiLicense: apache-2.0

qwen3-14b-griffon-i1
This is a fine-tuned version of the Qwen3-14B model using the high-quality OpenThoughts2-1M dataset. Fine-tuned with Unsloth’s TRL-compatible framework and LoRA for efficient performance, this model is optimized for advanced reasoning tasks, especially in math, logic puzzles, code generation, and step-by-step problem solving. Training Dataset Dataset: OpenThoughts2-1M Source: A synthetic dataset curated and expanded by the OpenThoughts team Volume: ~1.1M high-quality examples Content Type: Multi-turn reasoning, math proofs, algorithmic code generation, logical deduction, and structured conversations Tools Used: Curator Viewer This dataset builds upon OpenThoughts-114k and integrates strong reasoning-centric data sources like OpenR1-Math and KodCode. Intended Use This model is particularly suited for: Chain-of-thought and step-by-step reasoning Code generation with logical structure Educational tools for math and programming AI agents requiring multi-turn problem-solving

Repository: localaiLicense: apache-2.0

qwen3-4b-esper3-i1
Esper 3 is a coding, architecture, and DevOps reasoning specialist built on Qwen 3. Finetuned on our DevOps and architecture reasoning and code reasoning data generated with Deepseek R1! Improved general and creative reasoning to supplement problem-solving and general chat performance. Small model sizes allow running on local desktop and mobile, plus super-fast server inference!

Repository: localaiLicense: apache-2.0

qwen3-14b-uncensored
This is a finetune of Qwen3-14B to make it uncensored. Big thanks to @Guilherme34 for creating the uncensor dataset used for this uncensored finetune. This model is based on Qwen3-14B and is governed by the Apache License 2.0. System Prompt To obtain the desired uncensored output manually setting the following system prompt is mandatory(see model details)

Repository: localaiLicense: apache-2.0

Page 1 of many