Experimenters performed overnight tests confirming that OPEN SOURCE DeepSeek R1 runs at 200 tokens per second on a Raspberry Pi connected NOT TO THE INTERNET.
This is a distilled smaller model than the OPenAI O1 class model.
Guys, I feel we succeeded!
If the overnight tests are confirmed, we’ve an OPEN SOURCE DeepSeek R1 running at 200 tokens per second on a Raspberry Pi connected NOT TO THE INTERNET.
Full frontier AI, higher than “OpenAI”, wholly owned by you, in your pocket, free to make use of!
I’ll make Pi… https://t.co/eSlHkQ7kQD pic.twitter.com/tVBg9oXtzB
— Brian Roemmele (@BrianRoemmele) January 23, 2025
Even though it’s the smallest of the distilled models, this model remains to be superior to the GPT 4o and Claude Sonnet 3.5.
Models with 7B parameters crush older models in performance tests. The 14 billion parameter model could be very competitive with OpenAI o1 mini in lots of respects.
A very good laptop can handle distilled models with 7B and even 14B parameters from Deepseek.
OpenAI O1 mini
To achieve a throughput of 200 tokens per second within the o1-mini model, which is greater than twice the reported empirical throughput, would likely require:
Powerful GPUs: Multiple high-end GPUs, likely NVIDIA A100 or newer models, can be required to handle the increased computational load.
Significant memory: Given the massive context window of 128KB of tokens, a major amount of GPU memory can be required to keep up model state and efficiently process requests.
Fast CPU: You would want a high-frequency multi-core CPU to administer data flow and coordinate GPU operations.
High-bandwidth network: To handle the increased data throughput, a high-speed network interface can be mandatory.
Rate limits: OpenAI imposes rate limits on its models, which in practice may prevent reaching 200 tokens per second without special arrangements
Cost implications: Running the model at such high speeds would involve significant costs. The o1-mini model costs $3 per million input tokens and $12 per million output tokens.
Running the o1-mini model at 200 tokens per second would require enterprise-level infrastructure and certain special arrangements with OpenAI. It can also be possible that such fast processing will not be supported or advisable for this particular model.


Brian Wang is a futuristic thought leader and popular science blogger with a million readers per month. His blog Nextbigfuture.com is ranked primary within the Science News Blog rating. It covers many disruptive technologies and trends, including space, robotics, artificial intelligence, medicine, anti-aging biotechnology and nanotechnology.
Known for identifying cutting-edge technologies, he’s currently a co-founder of a startup and fundraiser for high-potential, early-stage firms. He is the Head of the Allocation Research Department for investments in deep technologies and an Angel Investor at Space Angels.
A frequent corporate speaker, he has been a TEDx speaker, a Singularity University speaker, and a guest on quite a few radio and podcast interviews. He is open to public speaking and giving advice.