AI Voice Agents: Voice agents with natural conversation flow that take action and run on infrastructure you control

Q: Cloud or self-hosted – which should we pick?

If you need a working MVP in days and your data isn't sensitive, start with the cloud path (ElevenLabs + Azure AI Foundry). If your project handles sensitive data, has strict compliance, runs at scale (where API costs add up), or has a carbon constraint – go self-hosted. Mother Earth AI is on the self-hosted path because all three reasons applied.

Q: How long does an AI voice agent take to ship?

1-2 weeks for a cloud-based MVP. Longer for the self-hosted path because of the infrastructure setup, but the long-term cost and control profile is very different.

Q: What languages do you support?

STT and TTS in the major European languages, including German, English and French. Smaller languages depend on the model – we test with your actual user group before committing.

Q: Can the voice agent take action?

Yes. Tool calling via MCP and the LiveKit Agent Framework lets the agent book calls, query a database, trigger an n8n workflow or call any API you give it access to.

Q: Where does the audio go?

On the self-hosted path, nowhere. Audio stays on your infrastructure. On the cloud path, audio is processed by your chosen provider – we work this out as part of the compliance design before any code is written.

Q: Can you build a physical voice installation like the Mutter Erde Telefon?

Yes. The Mutter Erde Telefon is a Raspberry Pi with Wi-Fi and a phone receiver, plugged into the same self-hosted infrastructure as the web version. Plug-and-play to museums, exhibitions or climate events. Talk to us about your installation idea.

What an AI voice agent from N3XTCODER actually is

An AI voice agent is a real-time spoken interface that talks to your users, takes action for them and runs on infrastructure you control. We build voice agents on two paths: a fast cloud path (ElevenLabs / Azure AI Foundry) for prototypes that need to ship in days, and a fully self-hosted open-source path (n8n + Whisper + Ollama + Piper) when sovereignty, cost or carbon constraints rule out hyperscalers. The second path is the one we used for Mother Earth AI – the self-hosted voice agent for climate communication that won the K3-Preis 2023.

What this means in practice

Mother Earth AI is the clearest worked example. The project gives planet Earth a literal voice for climate communication. Their constraint: the system could not promote carbon emissions through hyperscaled AI providers. Sovereignty, autonomy and carbon-independence were non-negotiable.

We collaborated with the Mother Earth team on a fully self-hosted voice agent that runs on Ollama as the LLM platform and Open WebUI as the interface, with all components on the team's own infrastructure. The voice agent now serves two surfaces: the public website at mother-earth.ai, and a physical "Mutter Erde Telefon" – a Raspberry Pi-based phone installation that travels to museums, exhibitions and climate events, where visitors can pick up the receiver and have a spoken conversation with Mother Earth without an app or screen. The project won the K3-Preis 2023 für Klimakommunikation.

Most projects start on the cloud path – ElevenLabs for natural voice quality plus Azure OpenAI / GPT-4o via Microsoft AI Foundry for the language model. The cloud path ships an MVP in 1-2 weeks. Mother Earth AI is the counter-example we reach for when the project has a hard sovereignty, cost or carbon constraint that rules out hyperscalers – self-hosted takes longer to set up but gives you full control over data, cost and energy footprint. Learn more in our voice assistant guide.

Key components

Real-time conversation icon

Real-time conversation

Turn-taking, interruption handling and natural pause timing
Live streaming architecture with LiveKit and Fast RTC where appropriate

Two delivery paths icon

Two delivery paths

Cloud path: ElevenLabs + Azure AI Foundry for fast MVPs (1-2 weeks)
Self-hosted path: n8n + Whisper + Ollama + Piper for full sovereignty

Tool calling and action icon

Tool calling and action

Voice agents that can actually do things: book a call, query a database, trigger a workflow
Tool calling and orchestration with MCP and the LiveKit Agent Framework

Outcomes

A voice agent that delivers icon

A voice agent that delivers

Delivering self-hosted architectures for >3 years

Time to first MVP icon

Time to first MVP

1-2 weeks on the cloud path; longer for self-hosted, with a clear migration path from one to the other

Cutting-edge speech models icon

Cutting-edge speech models

Parakeet, Whisper and Moonshine for speech-to-text; Kokoro and Piper for text-to-speech, in your language

Sovereignty by default

Self-hosted Ollama + open-source TTS / STT means audio never leaves your infrastructure if you don't want it to

Carbon-honest

Renewable-energy hosting where possible; transparent about the energy cost of voice models

**Want to talk it through? Book a call: Free of charge, full of value.**

How it works

1. Use case and architecture

Decide cloud vs self-hosted based on data sensitivity, cost over 12-24 months and compliance constraints
Pick the right STT, LLM and TTS components for your language and domain
Plan tool calling and integration points

2. Build the working agent

First MVP in 1-2 weeks on the cloud path
Test with real users in a real acoustic environment, not a lab
Tune voices, prompts and tool calling against actual conversations

3. Deploy and operate

Cloud deployment via Azure or your trusted EU provider
Self-hosted deployment via Docker / Kubernetes on your own infrastructure or Ionos
Documentation and handover so your team can operate the system

Why N3XTCODER

We bring a decade of impact-tech experience and over 160 AI projects since 2019. Through our free AI for Impact course, more than 100,000 people have learned how to use AI for the common good. We do not run inspiration days. We run scoping sessions and build engagements that ship, the way we have delivered AI for the organisations below:

Mother Earth AI – self-hosted voice agent for climate communication, K3-Preis 2023 winner, used in museums and on "Mutter Erde Telefon" Raspberry Pi installations
Kompetenzz – production retrieval-augmented generation (RAG) chatbot serving 1,000+ HumHub members on n8n + Qdrant + GPT-4 via Microsoft EU, delivered in four sprints
GDV (German Insurers Association) – AI Knowledge Assistant over tens of thousands of policy documents for 400+ member companies, on Azure AI Search + GPT-4o via Microsoft AI Foundry. Halved research time, prevented shadow AI use, increased internal employee satisfaction
A leading German association – AI Member Platform ("Association GPT") combining chat-based discovery with traditional category filters, on Microsoft AI Foundry + pgvector
innatura – AI email agent classifying enquiries and drafting replies with mandatory human review, currently in pilot, on N8N and Azure OpenAI
Default stack: n8n in Berlin, Qdrant or pgvector for vector search, Azure OpenAI / GPT-4o via Microsoft AI Foundry, plus open-source EU alternatives like Mistral, Milvus and self-hosted Ollama / Whisper / Piper for sovereign deployments.

Honest constraints

Voice agents fail when they don't allow real-time interruption. Sprachnachricht-style request/response systems are easier to build but frustrate users. If real-time turn-taking matters, it has to be designed in from the start – not bolted on.

Voice quality is not solved. Cloud TTS providers like ElevenLabs still beat open-source TTS like Piper or Coqui on naturalness. Open-source is closing the gap fast but if voice quality is the hill to die on, the cloud path makes more sense.

Multilingual is uneven. Speech recognition and synthesis in major languages is excellent. In smaller languages and dialects it's still patchy. Test with your actual user group before committing.

Voice eats energy. Voice models are heavier than text models. We track and disclose the cost rather than hide it. For projects where carbon honesty matters – like Mother Earth AI – this shapes the architecture choice.

Frequently asked questions

Cloud or self-hosted – which should we pick?

How long does an AI voice agent take to ship?

What languages do you support?

Can the voice agent take action?

Where does the audio go?

Can you build a physical voice installation like the Mutter Erde Telefon?

Build an AI voice agent with N3XTCODER

Tell us about the use case and the constraints. We will reply with a proposed architecture and a date, usually within a working day.

Simon Stegemann
Co-Founder and CEO

Other Services

AI Chatbot

AI Chatbot. An intelligent customer support assistant that guides users to the right content and actions. Enhance your customer experience with 24/7 automated support.

AI Discovery Lab

Enhance your product or tech vision with AI, Machine Learning and data expertise.

AI Knowledge Assistant

AI Knowledge Assistant for your team. A customised AI chatbot that knows about your data. Get definitive answers from your data in seconds.

AI Upskilling for Teams

Transform your workforce with tailored AI training programs and expert mentoring.

Open Innovation Programme

Leverage AI for the common good. We take on real world challenges by harnessing the potential of data science and AI and create impact solutions.