Overview / Description
Duplex is a client-side, browser-based LLM inference interface that lets developers run multiple language models in parallel - mixing local models (via Ollama and LM Studio) with frontier cloud APIs - inside one unified workspace with zero backend. Because it is a decentralized client-side wrapper, prompts and model traffic run from your browser rather than a hosted server, which the project frames as an air-gapped, privacy-preserving setup for local models. Its MultipleX engine runs inference concurrently inside the browser's V8 layer, distributing requests across separate network channels so simultaneous transfers do not lock up the UI. Concrete features include ParallaX inference mixing to blend local hardware nodes and cloud APIs together, browser-thread choke prevention that throttles token streams into a 16ms render tick to avoid stutter, orphaned-flow termination that fires AbortController.abort() across all channels when you stop generation, and a sandbox advisor failover that injects a client-side advisor if you submit a prompt with no active model. It connects to Ollama (port 11434), LM Studio (port 1234), vLLM clusters (port 8000), custom REST gateways, and frontier cloud APIs over secure OAuth. Local Ollama use requires launching with OLLAMA_ORIGINS set so the browser can query localhost. Pricing is not published; local models run at no usage cost while cloud calls use your own API keys.
Used For
Developers use Duplex to send a prompt to several local and cloud LLMs at once and compare their outputs side by side in a single browser interface with no backend.
Pricing
Pros & Cons
Pros
- Runs multiple local and cloud LLMs in parallel from one browser workspace
- Client-side with zero backend; local model traffic stays on your machine
- ParallaX engine mixes Ollama, LM Studio, vLLM, and cloud APIs together
- Stop Generation fires AbortController.abort() across all channels at once
- Connects to Ollama, LM Studio, vLLM, custom REST gateways, and OAuth cloud APIs
Cons
- Local Ollama use requires manually setting OLLAMA_ORIGINS to bypass browser CORS
- No published pricing and limited public documentation
- Cloud models still require your own paid API keys
Questions & Answers
Alternatives
Open WebUI, LibreChat, LM Studio, Jan, ChatHub