The Right Model for the Job: A Developer’s Guide to Choosing LLMs in Mendix
The GenAI landscape has exploded. OpenAI grabbed the early headlines, but Google’s Gemini has surged to the front of the pack, AWS Bedrock gives you enterprise muscle, and new models drop weekly. The question isn’t if you should use GenAI. It’s which model, for what, and where.
With our new Google Gemini Connector, Mendix now supports all the major model providers. Combined with our existing OpenAI, Amazon Bedrock, Mistral, and Mendix GenAI connectors, you’ve got the full buffet.
At Mendix we’ve ensured that you can build agentic software in a model agnostic way. Why? Because the “best” model doesn’t exist. There’s only the right model for your use case, your budget, and your infrastructure reality.
Start with the “why”: What are you actually building?
Before you choose a model, define your interaction pattern. The way you interact with an LLM shapes your requirements for latency, reasoning, and tool access:
- Single calls: One-and-done interactions like simple classification. Speed and cost per call are the primary metrics.
- Conversations: Multi-turn dialogue requiring context retention. A stock management conversational agent needs to remember that “How many do we have in stock?” refers to the product from three messages ago.
- Agents: The model reasons through multi-step problems and uses Mendix microflows as “tools” to query databases or trigger APIs. This demands advanced reasoning and reliable function calling.
- Batch processing: High-volume work (e.g., analyzing 10,000 feedback forms). Cost efficiency and throughput trump everything else.
Understanding the “intelligence tiers”: Reasoning vs. lightweight
Before looking at brands, you need to choose the class of model. In 2026, we categorize these by how they “think”:
Reasoning models (the “Thinkers”)
These models (like OpenAI o4 or Nova 2 Pro) use “extended thinking.” They don’t just predict the next word; they run internal simulations and verify their own logic before speaking.
- Trade-off: High latency and high cost. Use them only when accuracy on complex logic is non-negotiable.
Small language models (the “Sprinters”)
These are the “mini” or “flash” models. They are highly optimized for speed and cost.
- Trade-off: They can “hallucinate” more on complex logic but are perfect for classification, summarization, and simple RAG.
Multimodal models (the “Observers”):
Models like Gemini 3 Pro or Nova 2 Omni are built natively to process video, audio, and images alongside text. Chaining a vision model to a text model is old news; native multimodal is the 2026 standard for speed.
The model trade-off matrix
Every model choice is a trade-off. Here is the framework for your evaluation:
- Cost: Measured in tokens. Smaller models (GPT-5 mini, Gemini 3 Flash) are dramatically cheaper and often handle 80% of enterprise tasks.
- Latency: Critical for UI/UX; irrelevant for overnight batch processing.
- Reasoning and accuracy: Benchmarks are a start, but test with your actual data. “Thinking” models (like OpenAI o4) excel here but come with a “latency tax.”
- Context Window: Ranges from 128K to Gemini’s 2 million+ tokens.
Pro Tip: Beware of “lost in the middle” syndrome. Even with massive windows, models can overlook data buried in the center. For pinpoint accuracy, RAG (Retrieval-Augmented Generation) or providing context via tool-use (fetching only what’s needed) remains the gold standard.
- Multimodal: Can the model “see” or “hear”? Native multimodal models (like Gemini 3) can analyze images, audio, and video directly within the same context window. If your Mendix app needs to extract data from a photo of product damage or categorize audio logs, native is faster and more accurate than chaining separate services.
Current model comparison (early 2026)
Most major model providers cover different model types and sizes, however there is a small degree of specializationacross providers.
| Model class | Best for | Context | Relative cost | Latency |
| GPT-5.2 Thinking | Hardcore Logic / Verification | 128K | $$$$ | High (Thinking) |
| Amazon Nova 2 Pro | Enterprise Agents / Math | 1M | $$$ | Medium |
| Gemini 3 Pro | Long Context / Native Video | 2M | $$$ | Medium |
| Amazon Nova 2 Lite | High-Volume Multimodal | 300K | $$ | Fast |
| Gemini 3 Flash | Edge Cases / Ultra-Speed | 1M | $ | Ultra-Fast |
| Mistral Large 3 | Sovereign Data / Performance | 256K | $$ | Medium |
Infrastructure: Where should your models live?
1. Mendix GenAI Resource Packs (managed)
The “easy button.” Mendix handles the API keys, the scaling, and the privacy boundaries. You simply consume the model as a service within the Mendix Cloud. Best for rapid prototyping and standard enterprise apps.
2. Hyperscaler connectors (hybrid)
The “bring-your-own-cloud” (BYOC) option. Use our connectors for AWS Bedrock, Azure OpenAI, or Google Cloud. This is the most common path for enterprises already deep in a specific ecosystem who want to leverage their existing credits and security policies.
3. Private cloud / on-prem (sovereign)
The “Fort Knox” option. Host open-weight models like Llama 4 or Mistral on your own private infrastructure. Mendix connects to your local endpoint via the same standard interface. This is for maximum data sovereignty and privacy.
The provider landscape
- Mendix GenAI Resource Packs: The “turnkey” option. Delivered directly through Mendix Cloud, these packs provide pre-configured access to models like Anthropic Claude 4.5 and Cohere Embed. It’s the fastest path to production, removing the need to manage your own hyperscaler accounts or API keys.
- Google Cloud: Gemini 3 Pro and Flash lead in multimodal capabilities (processing video/audio natively) and massive context windows.
- Azure / OpenAI: The “gold standard” for logic if you are deeply embedded in the Microsoft stack and are often a relatively easy place to start.
- AWS Bedrock: A “model garden” approach, offering easy access to Claude, Llama, and Mistral under one managed service with enterprise-grade security.
- Anthropic (Claude): Known for surgical precision in coding and a more “steerable” writing style (Claude 4.5 Sonnet is a developer favorite).
- Mistral AI: The champion of efficiency and European data sovereignty. Their models (like Mistral Large 3) offer frontier-level performance with open-weight flexibility, making them ideal for on-premises or private cloud deployments.
Putting it together: Lato Bicycles
Example: Customer service routing agent
Lato Bicycles realized that not every decision needs an expensive LLM. They built a hybrid architecture:
- Deterministic logic: Order status requests are caught by simple regex in a microflow. Cost: $0. Accuracy: 100%.
- Lightweight classification: Distinguishing a “complaint” from a “technical question” is handled by Gemini 3 Flash.
- Nuanced routing: For a subsequent agentic assessment of the incoming enquiry, high-reasoning models like OpenAI o4 are used.
The Results: 60% of requests never touch an LLM. Cost drops dramatically, and accuracy increases because deterministic logic doesn’t hallucinate.

The freedom to experiment
The real power of Mendix’s model-agnostic approach is that you’re never locked in. With Mendix, you can build your agentic software the “chassis,” with the LLM as the “engine.” If a faster, cheaper engine comes out next week, you just swap the connector and keep driving.
You’re making a model choice today, not a model commitment forever.
Ready to start?
Download the Gen AI Showcase App from the Marketplace and try out some different LLMs today.