Is the DGX Spark Worth $4,700?
After running benchmarks on the DGX Spark across several models in Part 1, the obvious next question is: how does it actually compare to the alternatives, and is it worth the price tag?
Here's the lineup I put together for this comparison.
1. The Contenders
| Machine | Price | VRAM / RAM | Memory BW |
|---|---|---|---|
| DGX Spark GB10 | ~$4,700 | 128GB unified | 273 GB/s |
| RTX 5090 build | ~$3,800 | 32GB GDDR7 | 1,792 GB/s |
| RTX 4090 build | ~$2,800 | 24GB GDDR6X | 1,008 GB/s |
| Mac Studio M4 Max | $3,699 | 128GB unified | 546 GB/s |
The DGX is the most expensive machine on this list. The RTX 5090 has more than double the bandwidth for $900 less, which might make it the preferred choice for many hobbyists and AI practitioners. Even the Mac Studio, which a lot of people online treat as the obvious choice for running local models, has the same RAM and double the bandwidth for $1,000 less. Apple hardware being a value play is a relatively new phenomenon, but it really is competitive for local models right now.
Where the DGX Spark obviously beats everyone is form factor. It's small enough to fit in a bag, though most people probably don't need their AI lab to be portable.
2. Speed Comparison
| Model | Spark GB10 | RTX 5090 ✅ | RTX 4090 ⚠️ | Mac Studio ⚠️ |
|---|---|---|---|---|
| qwq:32b | 8.5 tok/s | 64.8 tok/s | ~40 tok/s | ~28 tok/s |
| qwen2.5-coder:32b | 7.3 tok/s | 62.2 tok/s | ~38 tok/s | ~25 tok/s |
| deepseek-r1:70b | 4.1 tok/s | ❌ OOM | ❌ OOM | ~10 tok/s |
| qwen3.5:122b | 15.1 tok/s | ❌ OOM | ❌ OOM | ❌ OOM |
✅ First-party data via RunPod. ⚠️ RTX 4090 and Mac Studio figures sourced from community benchmarks (r/LocalLLaMA) — RTX 4090 allocation was unavailable on RunPod during testing.
Let's get the embarrassing stat out of the way first: speed. On a 32B model, the RTX 5090 is 7.6× faster than the DGX Spark. We rented an RTX 5090 on RunPod for a few dollars to get first-party numbers. This goes back to the bandwidth advantage. When a model fits entirely in VRAM, throughput is essentially equivalent to memory bandwidth. The GDDR7 at 1,792 GB/s clearly dominates the LPDDR5x at 273 GB/s. The RTX 4090 is roughly 5× faster and the Mac Studio about 3× faster on 32B models.
The Spark is not winning on speed. There's no way to spin that differently.
3. The 128GB Advantage
What you also see in that table is that none of the alternatives can handle the 70B or 122B models at all. That's where the Spark's real advantage comes through.
The RTX 5090 smokes the Spark on 32B models. But once the model doesn't fit in VRAM anymore, the story changes completely. With partial CPU offload, models run 10–20× slower than full GPU processing. The VRAM cliff is a real problem.
The Spark handles large models without breaking a sweat. I loaded claude-distilled (16GB) and QwQ-32B (18GB) simultaneously and it handled both flawlessly. Try doing that on a 32GB GPU. R1-70B runs at 4.1 tok/s, which is slow, but usable for async tasks, research, or overnight batch jobs. My MoE model Qwen3.5:122b runs at 15.1 tok/s with ~10B active parameters per token. You simply can't load that on anything else on this list.
So yes, the 128GB of unified memory is the main selling point. Most other aspects are tradeoffs I'm willing to make.
4. The Real Competitor: Mac Studio M4 Max
In my view, the true head-to-head is the Mac Studio M4 Max. If your primary use case is inference and you're already in the Mac ecosystem, it might just be the machine to get. Same RAM, 2× the bandwidth, $1,300 cheaper.
What you get with the Spark instead is CUDA. New models, inference engines, and research repos typically run on NVIDIA before anything else. Something that appealed to me specifically was training and fine-tuning my own models. Metal is improving, but it's not on par with CUDA yet. Most cutting-edge research code runs on Linux/CUDA and requires porting to run on Metal. CUDA just gives you less friction.
Bottom line: The Mac Studio is the better inference appliance. The Spark is the better developer workstation.
5. Total Cost of Ownership
| Option | Upfront | Power/yr | 3-Year Total |
|---|---|---|---|
| DGX Spark | $4,700 | ~$80 | ~$4,940 |
| RTX 5090 build | $3,800 | ~$200 | ~$4,400 |
| RTX 4090 build | $2,800 | ~$180 | ~$3,340 |
| Mac Studio 128GB | $3,699 | ~$60 | ~$3,879 |
| Cloud (RunPod, 8h/day) | $0 | — | ~$15,768 |
The Spark only draws about 22–45W at idle. That's generally less than a full high-end GPU desktop. Cloud at 8 hours a day breaks even with the Spark in under a year. If you're a heavy daily user, the hardware pays back fast. The question is how many hours a day you actually run inference.
On resale value, my guess is the Spark holds its value better than a GPU, but we'll see how that plays out.
6. Why I Bought It
I wanted to run models that are too large to fit on anything else. I'll admit the speed difference on 32B models was a bit of a surprise. 8.5 tok/s vs 64.8 tok/s on the RTX 5090 is a massive gap and hard to ignore. But when I load R1-70B or run two models simultaneously, I don't regret the decision. That matters more to me than raw speed.
If you need to run large models locally and CUDA matters, the Spark is the only consumer option that gets you there. If you don't, there are cheaper ways.
7. Buy or Don't Buy?
✅ Buy it if:
- You want to run 70B+ models locally without VRAM limits
- You need CUDA for training, fine-tuning, or custom inference pipelines
- You want a quiet, compact, fanless Linux appliance with zero driver headaches
- You're a developer who needs large models without cloud latency or cost
- Privacy matters to you. Nothing ever leaves your machine
❌ Don't buy it if:
- You mostly run 7B–32B models. The RTX 4090/5090 is 5–7× faster for less money
- You're already in the Mac ecosystem. The Mac Studio M4 Max is better value for pure inference
- Speed is your priority. 4.1 tok/s on R1-70B is real friction for chat use
- You're cost-sensitive. The premium is hard to justify unless you actually max out the 128GB
Where to buy
DGX Spark GB10
Available directly from NVIDIA or on Amazon.
DGX Spark on Amazon →RTX 5090
GPU-only card — you'll need a build to go with it. Benchmark data via RunPod.
RTX 5090 on Amazon →RTX 4090
RTX 4090 on Amazon →Mac Studio M4 Max
Not available on Amazon. Buy direct from Apple.
Up next, Part 3: I ran two instances of the same AI model against each other in a strategy game. Same weights, same architecture. They played completely differently. What that tells us about how temperature affects decision-making and what it means for training your own models on a Spark. Coming soon.