#deployment architecture#on-premise#managed cloud#PBX integration#compliance

On-Prem or Cloud? Two Deployment Architectures for Integrating an AI Voice Receptionist — How to Choose, How to Connect

When enterprises adopt an AI voice receptionist, IT and security's first question usually isn't 'how accurate is it' but 'will customer data leave our data center, and how does it connect to our existing PBX?' Qubby ships the same containerized service in two deployment modes: AI inference always runs in the cloud; the difference is where media, recordings, and customer data land — and who handles operations.

2026年6月10日閱讀約 5 分鐘By Qubby Team

Two deployment architectures for Qubby's AI voice receptionist: on-premise IDC and managed cloud, both registering as an extension on the existing PBX

When an enterprise adopts an AI voice receptionist, IT and security's first question usually isn't "how accurate is it." It's two more practical ones: "Will customer data and call recordings ever leave our data center?" and "How does it connect to the phone system we've run for years?"

Qubby's answer: the same containerized service, offered two ways — deployed on-premise in the customer's own data center, or hosted on Qubby's managed cloud. In both, AI voice inference runs on the cloud-based Qubby multimodal voice model; the real difference is where media, recordings, and customer data land, and who runs operations.

Architecture 1: On-Premise — data never leaves your building

The whole service is deployed via Docker in the customer's own IDC. The SIP trunk terminates inside the data center, and the AI registers as an extension on the existing PBX's internal network — because it's an in-network handshake within the same facility, voice latency is the lowest possible.

The key is the data flow: call recordings, call logs, and customer PII all stay on the internal network. The only outbound traffic is the single "AI inference" stream, which exits encrypted over TLS through a firewall/proxy. All other voice media stays in-house.

On-premise: data and recordings stay in your own data center; only the AI inference stream goes out encrypted

Best for finance, healthcare, government, and large enterprises with strict data-residency and compliance requirements — data sovereignty stays in your hands.

Architecture 2: Qubby Managed Cloud — no data center to run

The full service runs on Qubby-managed AWS, multi-region (Taiwan, with Japan and others available). A Global Accelerator routes traffic to the nearest entry point, and ALB + certificates handle HTTPS termination. Our cloud telephony module registers as an IP phone (extension) across the network to the customer's PBX over VPN — being cross-network, latency is slightly higher than on-prem.

Data (personas/config/call logs) lives in cloud Firestore, recordings in S3, config in a Redis cache; monitoring, updates, and scaling are all handled by Qubby.

Managed cloud: multi-region AWS, the AI extension registers to the PBX across the network over VPN, fully managed by Qubby

Best for enterprises that want to launch fast, scale elastically, and avoid building and running a data center — open an account, connect a SIP trunk, and go live.

The integration core: the AI plugs into your PBX, it doesn't replace it

Both architectures use the exact same service modules: telephony (asterisk), call control (sidecar), the AI voice core (backend), the operations console (admin-console), the AI flow builder (ivr-builder-api), and the optional web voice agent (share-web).

Integration boils down to one move: the AI registers as a single "extension" on your existing PBX — without touching your switchboard, numbers, or existing extensions. The only difference is how that extension connects: on-prem, the SIP trunk enters the data center and the extension rides the internal network; in the cloud, the extension registers across the network over VPN.

One table to pick your architecture

Dimension	On-Premise IDC	Qubby Managed Cloud
Data residency	Recordings / call logs / PII stay in your own data center; only AI inference goes out	Data stored in cloud Firestore / S3 (managed by Qubby)
Voice latency	Lowest: AI extension and PBX register in-network, same facility	Slightly higher: AI extension registers to PBX across the network (VPN)
Security / compliance	Best fit for in-house data-residency requirements	Per cloud-provider compliance (cross-border data review needed)
Ops responsibility	Customer provides facility / hardware / network; Qubby provides containers and updates	Fully managed by Qubby (monitoring / updates / scaling)
Scalability	Bound by physical capacity; scaling means buying hardware	Fast and elastic (multi-region / horizontal scaling)
Time to launch	Longer: facility setup / network provisioning / firewall allowlists	Fastest: open an account + connect a SIP trunk and go live
Cost model	Facility / hardware CAPEX + license; owned long-term	Subscription / usage OPEX; zero hardware investment

Both connect to the cloud-based Qubby multimodal voice model for inference; the on-prem version only sends that single AI stream out encrypted, keeping everything else internal.

Start with a cloud PoC, then migrate smoothly to on-prem

Because both architectures share the exact same service modules, you can prove the value quickly with a managed-cloud PoC, then migrate smoothly to an on-premise production deployment — no rebuild required. If your compliance posture means even AI inference must not leave your network, an "on-prem LLM / private voice model" option can be evaluated (with separate compute planning).

Whether you prioritize data sovereignty or speed to launch, the essence of integration is the same: let the AI plug into your existing phone system as a single extension, and leave the rest to Qubby. To assess which architecture fits your facility and compliance constraints, talk to our consultants.