Homelab AI: Exposing Ollama on an Arch Linux Mini PC with Vulkan Acceleration
Deploying Large Language Models (LLMs) locally usually requires heavy, expensive desktop graphics cards. However, if you have an AMD-powered mini PC lying around—like the Minisforum UM690 featuring a Ryzen 9 6900HX and integrated Radeon 680M graphics—you can convert it into a quiet, efficient, dedicated AI server for your local network. Many developers try to host local models on an entry-level or older gaming laptop equipped with a dedicated NVIDIA card (like an RTX 3050 or 1650). However, these laptops are often crippled by a restrictive 4GB VRAM limit, which forces the LLM to overflow into system RAM, slowing generation speeds to an unusable crawl. By contrast, an AMD Mini PC utilizes a Unified Memory Architecture (UMA). By adjusting a simple BIOS setting, you can allocate 8GB or more of your system RAM directly to the integrated Radeon 680M iGPU. This provides a significantly larger, unified canvas capable of holding modern 3B and 8B models entirely in graphics memory without hitting local VRAM ceilings. ...