LocalAI - Models

llama3.1-8b-prm-deepseek-data

This is a process-supervised reward (PRM) trained on Mistral-generated data from the project RLHFlow/RLHF-Reward-Modeling The model is trained from meta-llama/Llama-3.1-8B-Instruct on RLHFlow/Deepseek-PRM-Data for 1 epochs. We use a global batch size of 32 and a learning rate of 2e-6, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/math/llama-3.1-prm.yaml.

Links

Tags

deepseek-r1-distill-llama-8b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

Tags

deepseek-coder-v2-lite-instruct

DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens sourced from a high-quality and multi-source corpus. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-Coder-V2-Base, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found in the paper.

Links

Tags

cursorcore-ds-6.7b-i1

CursorCore is a series of open-source models designed for AI-assisted programming. It aims to support features such as automated editing and inline chat, replicating the core abilities of closed-source AI-assisted programming tools like Cursor. This is achieved by aligning data generated through Programming-Instruct. Please read our paper to learn more.

Links

Tags

deepseek-r1-distill-qwen-1.5b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

deepseek-r1-distill-qwen-7b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

deepseek-r1-distill-qwen-14b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

deepseek-r1-distill-qwen-32b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

deepseek-r1-distill-llama-8b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

deepseek-r1-distill-llama-70b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

deepseek-r1-qwen-2.5-32b-ablated

DeepSeek-R1-Distill-Qwen-32B with ablation technique applied for a more helpful (and based) reasoning model. This means it will refuse less of your valid requests for an uncensored UX. Use responsibly and use common sense. We do not take any responsibility for how you apply this intelligence, just as we do not for how you apply your own.

Links

fuseo1-deepseekr1-qwen2.5-coder-32b-preview-v0.1

FuseO1-Preview is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced SCE merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.

Links

fuseo1-deepseekr1-qwen2.5-instruct-32b-preview

FuseO1-Preview is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced SCE merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.

Links

fuseo1-deepseekr1-qwq-32b-preview

FuseO1-Preview is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced SCE merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.

Links

uncensoredai_uncensoredlm-deepseek-r1-distill-qwen-14b

An UncensoredLLM with Reasoning, what more could you want?

Links

huihui-ai_deepseek-r1-distill-llama-70b-abliterated

This is an uncensored version of deepseek-ai/DeepSeek-R1-Distill-Llama-70B created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Links

internlm_oreal-deepseek-r1-distill-qwen-7b

We introduce OREAL-7B and OREAL-32B, a mathematical reasoning model series trained using Outcome REwArd-based reinforcement Learning, a novel RL framework designed for tasks where only binary outcome rewards are available. With OREAL, a 7B model achieves 94.0 pass@1 accuracy on MATH-500, matching the performance of previous 32B models. OREAL-32B further surpasses previous distillation-trained 32B models, reaching 95.0 pass@1 accuracy on MATH-500.

Links

pku-ds-lab_fairyr1-14b-preview

FairyR1-14B-Preview, a highly efficient large-language-model (LLM) that matches or exceeds larger models on select tasks. Built atop the DeepSeek-R1-Distill-Qwen-14B base, this model continues to utilize the 'distill-and-merge' pipeline from TinyR1-32B-Preview and Fairy-32B, combining task-focused fine-tuning with model-merging techniques—to deliver competitive performance with drastically reduced size and inference cost. This project was funded by NSFC, Grant 624B2005. As a member of the FairyR1 series, FairyR1-14B-Preview shares the same training data and process as FairyR1-32B. We strongly recommend using the FairyR1-32B, which achieves comparable performance in math and coding to deepseek-R1-671B with only 5% of the parameters. For more details, please view the page of FairyR1-32B. The FairyR1 model represents a further exploration of our earlier work TinyR1, retaining the core “Branch-Merge Distillation” approach while introducing refinements in data processing and model architecture. In this effort, we overhauled the distillation data pipeline: raw examples from datasets such as AIMO/NuminaMath-1.5 for mathematics and OpenThoughts-114k for code were first passed through multiple 'teacher' models to generate candidate answers. These candidates were then carefully selected, restructured, and refined, especially for the chain-of-thought(CoT). Subsequently, we applied multi-stage filtering—including automated correctness checks for math problems and length-based selection (2K–8K tokens for math samples, 4K–8K tokens for code samples). This yielded two focused training sets of roughly 6.6K math examples and 3.8K code examples. On the modeling side, rather than training three separate specialists as before, we limited our scope to just two domain experts (math and code), each trained independently under identical hyperparameters (e.g., learning rate and batch size) for about five epochs. We then fused these experts into a single 14B-parameter model using the AcreeFusion tool. By streamlining both the data distillation workflow and the specialist-model merging process, FairyR1 achieves task-competitive results with only a fraction of the parameters and computational cost of much larger models.

Links

deepseek-ai_deepseek-r1-0528-qwen3-8b

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro.

Links

Model Gallery

Filter by type:

Filter by tags:

llama3.1-8b-prm-deepseek-data

deepseek-r1-distill-llama-8b

deepseek-coder-v2-lite-instruct

cursorcore-ds-6.7b-i1

deepseek-r1-distill-qwen-1.5b

deepseek-r1-distill-qwen-7b

deepseek-r1-distill-qwen-14b

deepseek-r1-distill-qwen-32b

deepseek-r1-distill-llama-8b

deepseek-r1-distill-llama-70b

deepseek-r1-qwen-2.5-32b-ablated

fuseo1-deepseekr1-qwen2.5-coder-32b-preview-v0.1

fuseo1-deepseekr1-qwen2.5-instruct-32b-preview

fuseo1-deepseekr1-qwq-32b-preview

uncensoredai_uncensoredlm-deepseek-r1-distill-qwen-14b

huihui-ai_deepseek-r1-distill-llama-70b-abliterated

internlm_oreal-deepseek-r1-distill-qwen-7b

pku-ds-lab_fairyr1-14b-preview

deepseek-ai_deepseek-r1-0528-qwen3-8b