Model Gallery

Discover and install AI models from our curated collection

83 models available
1 repositories
Documentation

Find Your Perfect Model

Filter by Model Type

Browse by Tags

qwen_qwen3.5-0.8b
Describe the model in a clear and concise way that can be shared in a model gallery.

Repository: localaiLicense: unknown

mox-small-1-i1
The model, **vanta-research/mox-small-1**, is a small-scale text-generation model optimized for conversational AI tasks. It supports chat, persona research, and chatbot applications. The quantized versions (e.g., i1-Q4_K_M, i1-Q4_K_S) are available for efficient deployment, with the i1-Q4_K_S variant offering the best balance of size, speed, and quality. The model is designed for lightweight inference and is compatible with frameworks like HuggingFace Transformers.

Repository: localai

ibm-granite_granite-4.0-h-small
Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Repository: localaiLicense: apache-2.0

ibm-granite_granite-4.0-h-tiny
Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Repository: localaiLicense: apache-2.0

ibm-granite_granite-4.0-h-micro
Granite-4.0-H-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-H-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Repository: localaiLicense: apache-2.0

ibm-granite_granite-4.0-micro
Granite-4.0-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Repository: localaiLicense: apache-2.0

chatterbox
Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.

Repository: localaiLicense: mit

qwen3-4b-esper3-i1
Esper 3 is a coding, architecture, and DevOps reasoning specialist built on Qwen 3. Finetuned on our DevOps and architecture reasoning and code reasoning data generated with Deepseek R1! Improved general and creative reasoning to supplement problem-solving and general chat performance. Small model sizes allow running on local desktop and mobile, plus super-fast server inference!

Repository: localaiLicense: apache-2.0

ds-r1-qwen3-8b-arliai-rpr-v4-small-iq-imatrix
The best RP/creative model series from ArliAI yet again. This time made based on DS-R1-0528-Qwen3-8B-Fast for a smaller memory footprint. Reduced repetitions and impersonation To add to the creativity and out of the box thinking of RpR v3, a more advanced filtering method was used in order to remove examples where the LLM repeated similar phrases or talked for the user. Any repetition or impersonation cases that happens will be due to how the base QwQ model was trained, and not because of the RpR dataset. Increased training sequence length The training sequence length was increased to 16K in order to help awareness and memory even on longer chats.

Repository: localaiLicense: apache-2.0

qwen3-8b-shiningvaliant3
Shining Valiant 3 is a science, AI design, and general reasoning specialist built on Qwen 3. Finetuned on our newest science reasoning data generated with Deepseek R1 0528! AI to build AI: our high-difficulty AI reasoning data makes Shining Valiant 3 your friend for building with current AI tech and discovering new innovations and improvements! Improved general and creative reasoning to supplement problem-solving and general chat performance. Small model sizes allow running on local desktop and mobile, plus super-fast server inference!

Repository: localaiLicense: apache-2.0

nousresearch_hermes-4-14b
Hermes 4 14B is a frontier, hybrid-mode reasoning model based on Qwen 3 14B by Nous Research that is aligned to you. Read the Hermes 4 technical report here: Hermes 4 Technical Report Chat with Hermes in Nous Chat: https://chat.nousresearch.com Training highlights include a newly synthesized post-training corpus emphasizing verified reasoning traces, massive improvements in math, code, STEM, logic, creativity, and format-faithful outputs, while preserving general assistant quality and broadly neutral alignment. What’s new vs Hermes 3 Post-training corpus: Massively increased dataset size from 1M samples and 1.2B tokens to ~5M samples / ~60B tokens blended across reasoning and non-reasoning data. Hybrid reasoning mode with explicit … segments when the model decides to deliberate, and options to make your responses faster when you want. Reasoning that is top quality, expressive, improves math, code, STEM, logic, and even creative writing and subjective responses. Schema adherence & structured outputs: trained to produce valid JSON for given schemas and to repair malformed objects. Much easier to steer and align: extreme improvements on steerability, especially on reduced refusal rates.

Repository: localaiLicense: apache-2.0

genericrpv3-4b
Model's part of the GRP / GenericRP series, that's V3 based on Gemma3 4B, licensed accordingly. It's a simple merge. To see intended behavious, see V2 or sum, card's more detailed. allura-org/Gemma-3-Glitter-4B: w0.5 huihui-ai/gemma-3-4b-it-abliterated: w0.25 Danielbrdz/Barcenas-4b: w0.25 Happy chatting or whatever.

Repository: localaiLicense: gemma

llama-3.3-70b-instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Repository: localaiLicense: llama3.3

latitudegames_wayfarer-large-70b-llama-3.3
We’ve heard over and over from AI Dungeon players that modern AI models are too nice, never letting them fail or die. While it may be good for a chatbot to be nice and helpful, great stories and games aren’t all rainbows and unicorns. They have conflict, tension, and even death. These create real stakes and consequences for characters and the journeys they go on. Similarly, great games need opposition. You must be able to fail, die, and may even have to start over. This makes games more fun! However, the vast majority of AI models, through alignment RLHF, have been trained away from darkness, violence, or conflict, preventing them from fulfilling this role. To give our players better options, we decided to train our own model to fix these issues. The Wayfarer model series are a set of adventure role-play models specifically trained to give players a challenging and dangerous experience. We wanted to contribute back to the open source community that we’ve benefitted so much from so we open sourced a 12b parameter version version back in Jan. We thought people would love it but people were even more excited than we expected. Due to popular request we decided to train a larger 70b version based on Llama 3.3.

Repository: localaiLicense: llama3.3

nvidia_llama-3_3-nemotron-super-49b-v1
Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.3-70B-Instruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. The model supports a context length of 128K tokens. Llama-3.3-Nemotron-Super-49B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model’s memory footprint, enabling larger workloads, as well as fitting the model on a single GPU at high workloads (H200). This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. The model underwent a multi-phase post-training process to enhance both its reasoning and non-reasoning capabilities. This includes a supervised fine-tuning stage for Math, Code, Reasoning, and Tool Calling as well as multiple reinforcement learning (RL) stages using REINFORCE (RLOO) and Online Reward-aware Preference Optimization (RPO) algorithms for both chat and instruction-following. The final model checkpoint is obtained after merging the final SFT and Online RPO checkpoints. For more details on how the model was trained, please see this blog.

Repository: localaiLicense: llama3.3

watt-ai_watt-tool-70b
watt-tool-70B is a fine-tuned language model based on LLaMa-3.3-70B-Instruct, optimized for tool usage and multi-turn dialogue. It achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard (BFCL). Model Description This model is specifically designed to excel at complex tool usage scenarios that require multi-turn interactions, making it ideal for empowering platforms like Lupan, an AI-powered workflow building tool. By leveraging a carefully curated and optimized dataset, watt-tool-70B demonstrates superior capabilities in understanding user requests, selecting appropriate tools, and effectively utilizing them across multiple turns of conversation. Target Application: AI Workflow Building as in https://lupan.watt.chat/ and Coze. Key Features Enhanced Tool Usage: Fine-tuned for precise and efficient tool selection and execution. Multi-Turn Dialogue: Optimized for maintaining context and effectively utilizing tools across multiple turns of conversation, enabling more complex task completion. State-of-the-Art Performance: Achieves top performance on the BFCL, demonstrating its capabilities in function calling and tool usage. Based on LLaMa-3.1-70B-Instruct: Inherits the strong language understanding and generation capabilities of the base model.

Repository: localaiLicense: llama3.3

opencoder-8b-base
The model is a quantized version of infly/OpenCoder-8B-Base created using llama.cpp. It is part of the OpenCoder LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. The original OpenCoder model was pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples. It achieves high performance across multiple language model benchmarks and is one of the most comprehensively open-sourced models available.

Repository: localaiLicense: inf

opencoder-1.5b-base
The model is a large language model with 1.5 billion parameters, trained on 2.5 trillion tokens of code-related data. It supports both English and Chinese languages and is part of the OpenCoder LLM family which also includes 8B base and chat models. The model achieves high performance across multiple language model benchmarks and is one of the most comprehensively open-sourced models available.

Repository: localaiLicense: inf

opencoder-1.5b-instruct
The model is a quantized version of [infly/OpenCoder-1.5B-Instruct](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) created using llama.cpp. The original model, infly/OpenCoder-1.5B-Instruct, is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. The model is pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples. It achieves high performance across multiple language model benchmarks, positioning it among the leading open-source models for code.

Repository: localaiLicense: inf

llama-3.2-1b-instruct:q4_k_m
The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Repository: localaiLicense: llama3.2

llama-3.2-3b-instruct:q4_k_m
The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Repository: localaiLicense: llama3.2

Page 1 of many