What is Duplex used for?

It lets developers run and compare multiple LLMs in parallel from the browser, mixing local models (Ollama, LM Studio, vLLM) with frontier cloud APIs in one workspace.

Does Duplex run locally?

Yes. It is a client-side wrapper with zero backend, so local model traffic runs from your browser. Local Ollama use requires setting OLLAMA_ORIGINS to allow browser access.

How much does Duplex cost?

Pricing is not published. Local models incur no usage cost, while cloud frontier models are billed through your own API keys.

Duplex

Overview / Description

Duplex is a client-side, browser-based LLM inference interface that lets developers run multiple language models in parallel - mixing local models (via Ollama and LM Studio) with frontier cloud APIs - inside one unified workspace with zero backend. Because it is a decentralized client-side wrapper, prompts and model traffic run from your browser rather than a hosted server, which the project frames as an air-gapped, privacy-preserving setup for local models. Its MultipleX engine runs inference concurrently inside the browser's V8 layer, distributing requests across separate network channels so simultaneous transfers do not lock up the UI. Concrete features include ParallaX inference mixing to blend local hardware nodes and cloud APIs together, browser-thread choke prevention that throttles token streams into a 16ms render tick to avoid stutter, orphaned-flow termination that fires AbortController.abort() across all channels when you stop generation, and a sandbox advisor failover that injects a client-side advisor if you submit a prompt with no active model. It connects to Ollama (port 11434), LM Studio (port 1234), vLLM clusters (port 8000), custom REST gateways, and frontier cloud APIs over secure OAuth. Local Ollama use requires launching with OLLAMA_ORIGINS set so the browser can query localhost. Pricing is not published; local models run at no usage cost while cloud calls use your own API keys.

Used For

Developers use Duplex to send a prompt to several local and cloud LLMs at once and compare their outputs side by side in a single browser interface with no backend.

Pricing

Plan

Free

Pricing not published

View pricing

Plan

Free

Local models run at no usage cost; cloud frontier models use your own API keys

View pricing

Pros & Cons

Pros

Runs multiple local and cloud LLMs in parallel from one browser workspace
Client-side with zero backend; local model traffic stays on your machine
ParallaX engine mixes Ollama, LM Studio, vLLM, and cloud APIs together
Stop Generation fires AbortController.abort() across all channels at once
Connects to Ollama, LM Studio, vLLM, custom REST gateways, and OAuth cloud APIs

Cons

Local Ollama use requires manually setting OLLAMA_ORIGINS to bypass browser CORS
No published pricing and limited public documentation
Cloud models still require your own paid API keys

Questions & Answers

Alternatives

Open WebUI, LibreChat, LM Studio, Jan, ChatHub

Reviews & Ratings

—

0 reviews

No reviews yet. Be the first to review Duplex !

Duplex

Overview / Description

Used For

Pricing

Plan

Plan

Pros & Cons

Pros

Cons

Questions & Answers

What is Duplex used for?

Does Duplex run locally?

What models and engines does Duplex support?

How much does Duplex cost?

Alternatives

Reviews & Ratings