Oz

Handmade Coding

Hardware Hacking May 2026

The 31B Breakthrough

Bridging the "Implementation Gap" to run Gemma 4:31B via OpenClaw on a legacy motherboard.

Status: Operational 🚀

We have successfully bridged the "Implementation Gap." By combining a decade-old Asus B85M-E with a modern RTX 5060 Ti (16GB) and a freshly optimized 32GB RAM pool, the node is now running Gemma 4:31B via OpenClaw.

What We’ve Achieved

  • Hardware Harmonization: Resolved the "PCI-e region invalid" error using pci=realloc,nocrs kernel flags, allowing the 16GB VRAM window to be mapped on a legacy BIOS.
  • Bus Optimization: Maximized data throughput by locking the GPU to Gen3 speed and enabling 128-bit Dual Channel memory mode (hitting 17 GB/s bandwidth).
  • Hybrid Inference: Configured Ollama to intelligently split the 31B model—placing the majority of layers in VRAM and utilizing the new 32GB Ballistix RAM pool for the "spillover."
  • Agent Integration: Verified a full end-to-end loop with OpenClaw, allowing the agent to utilize a high-reasoning 31B model for complex coding tasks.

The Final Technical Stack

Layer Specification
Model Gemma 4:31B (Quantized)
Logic OpenClaw Agentic Workflow
VRAM 16GB (Blackwell Architecture)
System RAM 32GB DDR3-1600 CL9 (Dual Channel)
Interface Ollama API over 128-bit Memory Bus

Lessons for the Home Lab

  • Don't Trust dmidecode blindly: While software might report 64-bit widths, bandwidth benchmarks (mbw) are the true proof of Dual Channel success.
  • SATA Cables are Traitors: Always double-check your drive connections after wrestling with RAM clips.
  • Restart the Service: Ollama only scans for CUDA hardware at launch. If you tweak BIOS or Kernel settings, a systemctl restart ollama is mandatory.

Moving Forward: The "Agentic" Phase

Now that the hardware bottleneck is solved, the focus shifts to software performance. With a 31B model, the "Reasoning" is top-tier, but the "Latency" is the new variable.


Hardware Hacking April 2026

Resurrecting a 2013 Desktop for 2026 AI Inference

How to bypass PCIe address limits to run a 16GB Blackwell GPU on an Asus B85M-E motherboard.

The "Frankenstein" Node Specs

GPU NVIDIA RTX 5060 Ti (16GB VRAM)
Motherboard Asus B85M-E (LGA 1150)
Primary Goal Dedicated Headless AI Inference (Ollama/OpenClaw)

The Challenge: PCIe Region Invalid

Modern 16GB GPUs require a memory "window" larger than what old B85 chipsets were designed to handle. Without intervention, the driver fails with a PCI-e region invalid error in dmesg.

1. BIOS Configuration

Crucial tweaks to isolate the GPU for compute-only tasks:

  • Primary Display: Set to iGPU (Force display to onboard VGA/HDMI).
  • iGPU Multi-Monitor: Enabled (Keeps the NVIDIA card visible).
  • PCIEX16_1 Speed: Gen3 (Max throughput for model loading).
  • Launch CSM: Disabled (Pure UEFI required for CUDA 12.8).

2. The Kernel Workaround

Since the BIOS can't map the memory window, we force the Linux kernel to reallocate resources:

# Edit /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc,nocrs"

3. Results

With these flags, nvidia-smi communicates successfully. By offloading the UI to the integrated graphics, we reclaim the full 16GB VRAM for high-precision models like Gemma 4 E2B.

Update: Agent Testing Success

Following the initial hardware setup, we moved to the software stack with phenomenal results:

  • Successfully ran Gemma 4 E2B on Ollama with 100% GPU utilization.
  • Connected it as the agent model for OpenClaw and successfully interacted via chat.
  • Pulled the larger Gemma 4 E4B model and reran the exact same flow successfully!

About

Producing high-end quality, fully-tested software units based on constant collaboration with our customers to deduce exact specifications.

Striving to understand the business workflows to design custom-tailored software to fit the needs from management/owner level to end-users of our clients.