On-Prem or Cloud? Two Deployment Architectures for Integrating an AI Voice Receptionist — How to Choose, How to Connect
When enterprises adopt an AI voice receptionist, IT and security's first question usually isn't 'how accurate is it' but 'will customer data leave our data center, and how does it connect to our existing PBX?' Qubby ships the same containerized service in two deployment modes: AI inference always runs in the cloud; the difference is where media, recordings, and customer data land — and who handles operations.

When an enterprise adopts an AI voice receptionist, IT and security's first question usually isn't "how accurate is it." It's two more practical ones: "Will customer data and call recordings ever leave our data center?" and "How does it connect to the phone system we've run for years?"
Qubby's answer: the same containerized service, offered two ways — deployed on-premise in the customer's own data center, or hosted on Qubby's managed cloud. In both, AI voice inference runs on the cloud-based Qubby multimodal voice model; the real difference is where media, recordings, and customer data land, and who runs operations.
Architecture 1: On-Premise — data never leaves your building
The whole service is deployed via Docker in the customer's own IDC. The SIP trunk terminates inside the data center, and the AI registers as an extension on the existing PBX's internal network — because it's an in-network handshake within the same facility, voice latency is the lowest possible.
The key is the data flow: call recordings, call logs, and customer PII all stay on the internal network. The only outbound traffic is the single "AI inference" stream, which exits encrypted over TLS through a firewall/proxy. All other voice media stays in-house.

Best for finance, healthcare, government, and large enterprises with strict data-residency and compliance requirements — data sovereignty stays in your hands.
Architecture 2: Qubby Managed Cloud — no data center to run
The full service runs on Qubby-managed AWS, multi-region (Taiwan, with Japan and others available). A Global Accelerator routes traffic to the nearest entry point, and ALB + certificates handle HTTPS termination. Our cloud telephony module registers as an IP phone (extension) across the network to the customer's PBX over VPN — being cross-network, latency is slightly higher than on-prem.
Data (personas/config/call logs) lives in cloud Firestore, recordings in S3, config in a Redis cache; monitoring, updates, and scaling are all handled by Qubby.

Best for enterprises that want to launch fast, scale elastically, and avoid building and running a data center — open an account, connect a SIP trunk, and go live.
The integration core: the AI plugs into your PBX, it doesn't replace it
Both architectures use the exact same service modules: telephony (asterisk), call control (sidecar), the AI voice core (backend), the operations console (admin-console), the AI flow builder (ivr-builder-api), and the optional web voice agent (share-web).
Integration boils down to one move: the AI registers as a single "extension" on your existing PBX — without touching your switchboard, numbers, or existing extensions. The only difference is how that extension connects: on-prem, the SIP trunk enters the data center and the extension rides the internal network; in the cloud, the extension registers across the network over VPN.
One table to pick your architecture
| Dimension | On-Premise IDC | Qubby Managed Cloud |
|---|---|---|
| Data residency | Recordings / call logs / PII stay in your own data center; only AI inference goes out | Data stored in cloud Firestore / S3 (managed by Qubby) |
| Voice latency | Lowest: AI extension and PBX register in-network, same facility | Slightly higher: AI extension registers to PBX across the network (VPN) |
| Security / compliance | Best fit for in-house data-residency requirements | Per cloud-provider compliance (cross-border data review needed) |
| Ops responsibility | Customer provides facility / hardware / network; Qubby provides containers and updates | Fully managed by Qubby (monitoring / updates / scaling) |
| Scalability | Bound by physical capacity; scaling means buying hardware | Fast and elastic (multi-region / horizontal scaling) |
| Time to launch | Longer: facility setup / network provisioning / firewall allowlists | Fastest: open an account + connect a SIP trunk and go live |
| Cost model | Facility / hardware CAPEX + license; owned long-term | Subscription / usage OPEX; zero hardware investment |
Both connect to the cloud-based Qubby multimodal voice model for inference; the on-prem version only sends that single AI stream out encrypted, keeping everything else internal.
Start with a cloud PoC, then migrate smoothly to on-prem
Because both architectures share the exact same service modules, you can prove the value quickly with a managed-cloud PoC, then migrate smoothly to an on-premise production deployment — no rebuild required. If your compliance posture means even AI inference must not leave your network, an "on-prem LLM / private voice model" option can be evaluated (with separate compute planning).
Whether you prioritize data sovereignty or speed to launch, the essence of integration is the same: let the AI plug into your existing phone system as a single extension, and leave the rest to Qubby. To assess which architecture fits your facility and compliance constraints, talk to our consultants.
