Your laptop already paid for a GPU. Your AI loops ignore it.

You've got an agent refining code. It writes, runs the tests, fails, fixes, runs again. Twenty rounds. Fifty. Every round is a call to a paid model.

None of those rounds needed the best model in the world. Most of them a small open-weights model running on the machine in front of you would have handled. But they all went to the cloud, at premium rates, because your tooling couldn't tell the difference.

That's the thing we built to fix. It's called hibrid, it's open source, and the idea is easy to say and oddly hard to find: a router that knows what your machine can run and decides, on its own, what runs locally and what goes to the cloud. Without you thinking about it.

the gap nobody looks at

The market split in two.

On one side, the cloud gateways — OpenRouter, LiteLLM and the rest. They route well between paid models. But your computer doesn't exist to them: your GPU, your 32GB of RAM, the Mac with unified memory that can hold a 70B — none of it enters the math.

On the other side, the local apps — Ollama, LM Studio. They run models on your machine beautifully. But deciding when to use local and when to use the cloud is on you, by hand, model by model.

The crossing of the two — an automatic router that looks at your hardware and your access to paid models and splits the work by task — until now only lived in research papers. Not in something you can install.

what hibrid does

Three decisions, all automatic, all transparent.

It knows what your machine runs, for real. On startup hibrid detects your RAM, your VRAM, your chip, and runs a micro-benchmark that measures the actual tokens-per-second of the models you have. It doesn't trust a table off the internet. It measures your machine. As far as we can tell, no other router does this.

It splits by the type of task, not just the question. This is the piece that changes the bill. A one-off hard task can earn the expensive model: you pay for it once. But a loop — refining code, iterative QA — is dozens or hundreds of calls. hibrid sends those local first and saves the paid model for the one moment it earns its keep: the final check. The loop runs free; the expensive model signs off. Once.

Your sensitive data doesn't leave your machine. If hibrid spots personal data in what you're about to send — an email, an ID number, a key — it forces local execution. It's not a preference you can forget to turn on. It's the rule. A cloud gateway can't promise you that: your text already traveled to a third party before anything was decided.

And all of this speaks the language you already use: an OpenAI-compatible API. Adopting hibrid is changing the URL. That's it.

why this belongs to the community, not to us

hibrid gets better when it knows what each machine runs. And we don't know that — you do, each of you with your own hardware.

So the core piece is a shared benchmark registry. You install hibrid, it measures your machine, and — if you want — you share the result: hardware and speed only, never your prompts. In return, the next person who shows up with a laptop like yours routes well from minute one, without waiting to measure.

You contribute a data point that costs you nothing and saves the next person money. The next person does the same for you. That's a community, not a user list. The code is Apache-2.0; the routing profiles are shared and improved together. If you've got an unusual machine, your benchmark is exactly what the project is missing.

how we built it — in an afternoon, with a team of agents

One detail that says something about the moment we're in. hibrid's design didn't come out of a single head. We stood up a team of three AI agents working in parallel: one swept the scientific literature on cloud-local routing, one mapped which models run on which hardware and at what real speed, and a third compared everything already on the market and crossed its findings with the other two.

The design came out of that, and the code came out of the design. Three agents researching as a team, not as three separate searches. It's also exactly the kind of work — long research and refinement loops — that hibrid is built for.

a fair warning

This is day one. The engine works and the decision logic is tested, but the benchmark registry is just being seeded and the confidence calibration will get sharper the more it's used. If you route something hard to a local model that can't handle it, the cascade is supposed to catch it and escalate — that's the part I'd push on hardest if you're kicking the tires. Tell me where it breaks.

try it

pip install hibrid
hibrid serve
curl localhost:8095/v1/node   # see what it says about your machine

Point your usual client at localhost:8095 and let it decide. If it saves you a bill, pay it back: share your machine's benchmark in the repo. It's the easiest contribution and the most useful one.

The router that knows your machine. Open source. Yours.