Guides

In-depth breakdowns on local LLM hardware, benchmarks, and running AI models at home.

Publishing cadence: upcoming posts are listed below with ET dates and become clickable once live.

Featured Guides

Forget benchmarks. I put LLMs in a real-time strategy game.

I built a real-time strategy game where AI models compete for territory and survival. What I found changed how I think about model intelligence.

Live·Mar 10, 2026

→

📌 Pinned · Start Here

Local LLM Hardware Guide: VRAM, Quantization, and What You Can Actually Run

The plain-English primer on VRAM, quantization, and what hardware tier gets you to each model size. Start here if you're new to local AI.

Live·~1,800 words

→

🔬 New · DGX Spark Part 2

Is the DGX Spark Worth $4,700?

How the Spark stacks up against the RTX 5090, RTX 4090, and Mac Studio M4 Max. First-party benchmarks, total cost of ownership, and an honest buy/don't-buy verdict.

Live·March 8, 2026

→

🔬 First-Party Data · DGX Spark Part 1

DGX Spark GB10 Benchmarks: Real Numbers from a Real Machine

First-party tok/s data for QwQ-32B, DeepSeek-R1-70B, Qwen2.5-Coder-32B, and Qwen3.5-122B on NVIDIA's GB10 — plus long-context degradation and memory bandwidth efficiency.

Live·March 7, 2026

→

Featured · Trend Guide

Best Hardware to Run Claude-Distilled GGUF Models Locally

The freshest high-intent topic on the site: what to buy for Claude-style distilled models from 7B through 70B.

Live·~2,500 words

→

Pillar Guide · GPU

Best GPU for Running LLMs Locally in 2026

RTX 5090 vs RTX 4090 vs RX 7900 XTX — real benchmark numbers, VRAM requirements, and a clear winner for every budget.

Live·~2,000 words

→

Featured · Memory

How Much RAM Do You Need to Run Llama 3?

The practical memory-sizing guide for 8B, 70B, CPU offloading, and avoiding bad hardware buys.

Live·~2,400 words

→

GPU Guides

GPU · Comparison

RTX 4090 vs RTX 4080 for Local LLMs

Is the 4090's extra VRAM worth the premium? Real inference benchmarks and the honest answer.

Live·~2,500 words

→

NVIDIA vs AMD

RTX 4090 vs RX 7900 XTX for Local LLMs: CUDA vs ROCm

Both have 24GB VRAM at similar bandwidth. But CUDA vs ROCm maturity changes everything for LLM inference.

Live·~3,300 words

→

Coding · Models

Best Local LLM for Coding in 2026

DeepSeek Coder, Qwen2.5, CodeLlama — benchmarks, hardware requirements, and IDE integration for each.

Live·~2,400 words

→

Apple Silicon Guides

Pillar Guide · Apple Silicon

Mac for Local LLMs: The Complete Apple Silicon Guide

Every Apple Silicon chip from M1 to M5 Max — which models fit, real performance numbers, and the best tools to get started.

Live·~1,000 words

→

Apple Silicon · M5

MacBook Air M5 for Local LLMs: What Models Can It Run?

First Air with 32GB unified memory. Runs Qwen2.5-32B, Mixtral 8x7B on a fanless laptop.

Live·~2,400 words

→

Apple Silicon · Comparison

MacBook Air M5 32GB vs MacBook Pro M5 Pro 64GB for LLMs

32GB portability vs 64GB model capacity. Which laptop for local AI?

Live·~2,800 words

→

Apple Silicon · Comparison

Mac Studio M4 Max: 64GB vs 128GB for Local LLMs

Is doubling the memory worth $400? Model compatibility and performance breakdown.

Live·~2,700 words

→

Apple Silicon · Setup

How to Run Llama on Mac: Apple Silicon Guide

Step-by-step setup with Ollama, llama.cpp, and LM Studio. Which Mac handles which model sizes.

Live·~2,500 words

→

GPU vs Mac Comparisons

GPU vs Apple Silicon

RTX 4090 vs Mac Studio M4 Max 128GB for Local LLMs

24GB VRAM vs 128GB unified memory — which actually wins for running 70B models locally?

Live·~2,400 words

→

GPU vs GPU

RTX 4090 vs RTX 4080 for Local LLMs: Is the Upgrade Worth It?

24GB vs 16GB VRAM, 1008 vs 717 GB/s bandwidth — is the $600 price gap justified for running larger models?

March 2026·~2,400 words

→

GPU vs Apple Silicon

RTX 4090 vs MacBook Pro M5 Max 128GB for Local LLMs

Desktop GPU power vs laptop portability. Which is the better investment?

Live·~2,600 words

→

More Guides

Homelab · Security

Homelab Security Best Practices (Reddit Intel)

Practical hardening checklist for self-hosted AI stacks using patterns repeatedly surfaced in community incidents.

Live·March 6, 2026

→

Mini PC · Self-Hosting

Best Mini PC for Self-Hosting AI in 2026

Minisforum, Beelink, Mac Mini — CPU inference vs GPU inference, and which mini PC handles local AI.

Live·~2,700 words

→

Memory · Llama 3

How Much RAM Do You Need to Run Llama 3?

8B to 405B — exact VRAM and RAM requirements for every Llama 3 model size, with and without quantization.

Live·~2,400 words

→