Control a Virtual Computer from Your Mendix App Using Gen AI

Key Takeaways
- Computer use lets your Mendix app do things, not just respond. By integrating with an LLM, your app can control a virtual computer to perform real actions—like clicking, typing, and navigating apps—based on user input.
- Mendix makes computer use setup surprisingly smooth. With building blocks like GenAI Commons, the Amazon Bedrock connector, and REST integrations, you can get a working example up and running without a ton of custom code.
- Thoughtful setup leads to better performance and safer outcomes. With the right prompts, model selection, and guardrails in place, you can guide your LLM to act efficiently and safely while unlocking the full potential of computer use, keeping costs and risks in check.
What if your app could actually use a computer—like, open files, click buttons, and get stuff done—just by asking?
Thanks to GenAI and some handy Mendix tools, that’s not just possible—it’s surprisingly doable. In this post, we’ll show you how to build a Mendix app that talks to a virtual computer and makes things happen. You’ll see how to plug in large language models (LLMs), set up the right connectors, and let your users drive actions just by interacting with your app using human language.
What is Computer Use?
Computer Use is a new and powerful idea in the world of Generative AI (GenAI) and app development. It lets developers build apps that can control a computer, usually a virtual one, to complete tasks based on what the user asks.
In short: a computer-use setup consists of an app that connects to a specialized large language model (LLM) that’s trained to predict computer use actions. When a user types something into the app like – “Can you search for flights to Amsterdam for the second weekend the next month?” – the LLM figures out the steps needed to make that happen while keeping chat history and context in mind.
These steps could range from opening a browser, to typing in a search box, and clicking a button. The LLM then sends back specific instructions like where to move the mouse, what to type, or what to click. The app uses those instructions to control the virtual computer. Sometimes, the app also sends screenshots of the computer screen to the LLM, so it can see what’s happening and decide what to do next.
With Mendix, you can already build a “smart app” that helps users and supports business processes using computer-use features. It’s easy to get started by using platform-supported components from the Mendix Marketplace like GenAI Commons, Conversational UI, and an LLM connector that works with your setup. Connecting your app to a virtual computer doesn’t take long. The app can communicate with the computer using familiar tools – like REST APIs – which are commonly used in Mendix projects.
Putting computer use into action in Mendix
In our example, the Mendix app uses a chat interface where the user can type in what they want to do. The app then carries out the necessary actions on a virtual desktop. This setup is based on the GenAI Showcase app, which includes a bunch of small, working examples to help developers learn GenAI patterns and build smarter Mendix apps. It’s meant to both inspire and teach, especially for those who are new to GenAI.
Picking the LLM and setting it up
We’re using Anthropic Claude Sonnet 3.7, an LLM that is hosted through Amazon Bedrock. We’ll connect to it using the Mendix-supported Amazon Bedrock connector. When enabled for computer use (we chose version 20250124), the LLM will respond with computer use instructions sent as tool-call objects that can be processed in a function-calling pattern in the Mendix app.
Function calling with the agent loop
The Mendix app follows a function calling pattern, which means it handles the LLM’s tool-calls using logic in microflows. It also sends screenshots of the virtual desktop back to the LLM so it can see the current screen and figure out the next step. This back-and-forth is called the agent loop.
Thanks to the GenAI Commons module, this works right out of the box using the Chat Completions action. We only needed one function microflow to map the LLM’s instructions (what to click, where to type, etc.) and send them as a command to the virtual computer.
Connecting to the virtual computer
To make the virtual machine do what the app says, we set up a REST integration with a local Docker container. The image used for our container is based on Anthropic’s Computer Use demo repository, and includes the Python code that carries out the commands.
We wrapped this code in a simple HTTP server and published it inside the container. That way, the Mendix app can just use a Call REST Service activity in the microflow to trigger the actions.
Seeing it all in action
The virtual computer’s desktop is visible on a local port (e.g., localhost:6080
), so we can actually watch what’s happening in real time in a browser. If you place the chat interface and the virtual desktop side-by-side in the Mendix app, you can clearly see how the system responds if the user enters, for example, their request for flight information.

What to keep in mind when using computer use
Computer use is powerful, but it comes with some important things to consider.
1 – Model Quality Matters
Different LLMs (and even different versions of the same LLM) will give you different results. Some are better at understanding screenshots, others at planning efficient steps. Some give shorter answers, while others are better at recovering when something goes wrong.
Keep in mind: computer use is still in beta for most major LLM providers, including OpenAI and Amazon Bedrock, so things are changing fast.
2 – It can be slow and get expensive
Depending on many factors – e.g. how the model is prompted, what model and toolset versions are used, and how good its vision capabilities are – the app may need to call the LLM many times for one request. Each of those calls uses tokens, which can add up in cost. And if the LLM keeps trying without getting the right answer, it might keep looping until it hits a token or time limit.
Also, the process can be slow. Each step takes a few seconds, and every screenshot has to be created and sent. While this may take longer than a human doing the same task, it can still be worth it, especially if you’re automating repetitive tasks or handling many requests at once. Automating these boring or repetitive tasks—especially if they can run at the same time—can free people up to focus on more important and detailed work.
3 – Security and safety are crucial
Since computer use gives the app full control of a virtual machine, it’s important to set limits. Without safeguards, the system could run unwanted commands.
You should:
- Limit what software and system access is available on the virtual machine
- Set up LLM prompts and guardrails to check actions before they’re executed
- Decide what level of risk is acceptable
- Test carefully and make sure the system behaves the way you would expect
Why computer use matters
Computer use is opening a whole new way to build smart, helpful apps with Mendix. By combining LLMs with a virtual desktop and some well-placed logic, you can create applications that don’t just respond to users—they act on their behalf. While there are still challenges around performance, cost, and safety, the potential is huge. Whether you’re automating repetitive tasks or exploring new ways to streamline workflows, this is a powerful pattern worth experimenting with.