Today, we’re excited to announce three major enhancements to model fine-tuning in Azure AI Foundry—Reinforcement Fine-Tuning (RFT) with o4-mini (coming soon), Supervised Fine-Tuning (SFT) for the GPT-4.1-nano and Llama 4 Scout model (available now). These updates reflect our continued commitment to empowering organizations with tools to build highly customized, domain-adapted AI systems for real-world impact.
With these new models, we’re unblocking two major avenues of LLM customization: GPT-4.1-nano is a powerful small model, ideal for distillation, while o4-mini is the first reasoning model you can fine-tune, and Llama 4 Scout is a best-in-class open source model.
Reinforcement Fine-Tuning with o4-mini
Reinforcement Fine-Tuning introduces a new level of control for aligning model behavior with complex business logic. By rewarding accurate reasoning and penalizing undesirable outputs, RFT improves model decision-making in dynamic or high-stakes environments.
Coming soon for the o4-mini model, RFT unlocks new possibilities for use cases requiring adaptive reasoning, contextual awareness, and domain-specific logic—all while maintaining fast inference performance.
Real world impact: DraftWise
DraftWise, a legal tech startup, used reinforcement fine-tuning (RFT) in Azure AI Foundry Models to enhance the performance of reasoning models tailored for contract generation and review. Faced with the challenge of delivering highly contextual, legally sound suggestions to lawyers, DraftWise fine-tuned Azure OpenAI models using proprietary legal data to improve response accuracy and adapt to nuanced user prompts. This led to a 30% improvement in search result quality, enabling lawyers to draft contracts faster and focus on high-value advisory work.
Reinforcement fine-tuning on reasoning models is a potential game changer for us. It’s helping our models understand the nuance of legal language and respond more intelligently to complex drafting instructions, which promises to make our product significantly more useful to lawyers in real time.
—James Ding, founder and CEO of DraftWise.
When should you use Reinforcement Fine-Tuning?
Reinforcement Fine-Tuning is best suited for use cases where adaptability, iterative learning, and domain-specific behavior are essential. You should consider RFT if your scenario involves:
- Custom Rule Implementation: RFT thrives in environments where decision logic is highly specific to your organization and cannot be easily captured through static prompts or traditional training data. It enables models to learn flexible, evolving rules that reflect real-world complexity.
- Domain-Specific Operational Standards: Ideal for scenarios where internal procedures diverge from industry norms—and where success depends on adhering to those bespoke standards. RFT can effectively encode procedural variations, such as extended timelines or modified compliance thresholds, into the model’s behavior.
- High Decision-Making Complexity: RFT excels in domains with layered logic and variable-rich decision trees. When outcomes depend on navigating numerous subcases or dynamically weighing multiple inputs, RFT helps models generalize across complexity and deliver more consistent, accurate decisions.
Example: Wealth advisory at Contoso Wellness
To showcase the potential of RFT, consider Contoso Wellness, a fictitious wealth advisory firm. Using RFT, the o4-mini model learned to adapt to unique business rules, such as identifying optimal client interactions based on nuanced patterns like the ratio of a client’s net worth to available funds. This enabled Contoso to streamline their onboarding processes and make more informed decisions faster.
Supervised Fine-Tuning now available for GPT-4.1-nano
We’re also bringing Supervised Fine-Tuning (SFT) to the GPT-4.1-nano model—a small but powerful foundation model optimized for high-throughput, cost-sensitive workloads. With SFT, you can instill your model with company-specific tone, terminology, workflows, and structured outputs—all tailored to your domain. This model will be available for fine-tuning in the coming days.
Why Fine-tune GPT-4.1-nano?
- Precision at Scale: Tailor the model’s responses while maintaining speed and efficiency.
- Enterprise-Grade Output: Ensure alignment with business processes and tone-of-voice.
- Lightweight and Deployable: Perfect for scenarios where latency and cost matter—such as customer service bots, on-device processing, or high-volume document parsing.
Compared to larger models, 4.1-nano delivers faster inference and lower compute costs, making it well suited for large-scale workloads like:
- Customer support automation, where models must handle thousands of tickets per hour with consistent tone and accuracy.
- Internal knowledge assistants that follow company style and protocol in summarizing documentation or responding to FAQs.
As a small, fast, but highly capable model, GPT-4.1-nano makes a great candidate for distillation as well. You can use models like GPT-4.1 or o4 to generate training data—or capture production traffic with stored completions—and teach 4.1-nano to be just as smart!

Llama 4 Fine-Tuning now available
We’re also excited to announce support for fine-tuning Meta’s Llama 4 Scout—a cutting edge,17 billion active parameter model which offers an industry leading context window of 10M tokens while fitting on a single H100 GPU for inferencing. It’s a best-in-class model, and more powerful than all previous generation llama models.
Llama 4 fine-tuning is available in our managed compute offering, allowing you to fine-tune and inference using your own GPU quota. Available in both Azure AI Foundry and as Azure Machine Learning components, you have access to additional hyperparameters for deeper customization compared to our serverless experience.
Get started with Azure AI Foundry today
Azure AI Foundry is your foundation for enterprise-grade AI tuning. These fine-tuning enhancements unlock new frontiers in model customization, helping you build intelligent systems that think and respond in ways that reflect your business DNA.
- Use Reinforcement Fine-tuning with o4-mini to build reasoning engines that learn from experience and evolve over time. Coming soon in Azure AI Foundry, with regional availability for East US2 and Sweden Central.
- Use Supervised Fine-Tuning with 4.1-nano to scale reliable, cost-efficient, and highly customized model behaviors across your organization. Available now in Azure AI Foundry in North Central US and Sweden Central.
- Try Llama 4 scout fine tuning to customize a best-in-class open source model. Available now in Azure AI Foundry model catalog and Azure Machine Learning.
With Azure AI Foundry, fine-tuning isn’t just about accuracy—it’s about trust, efficiency, and adaptability at every layer of your stack.
Explore further:
- Get started with Azure AI Foundry.
- Documentation on fine-tuning in Azure AI Foundry.
We’re just getting started. Stay tuned for more model support, advanced tuning techniques, and tools to help you build AI that’s smarter, safer, and uniquely yours.
The post Announcing new fine-tuning models and techniques in Azure AI Foundry appeared first on Microsoft Azure Blog.