I've been running a fully local AI stack on my M4 Mac Mini — Ollama for text generation, ComfyUI for images, and n8n to wire it all together. Here's the complete setup guide so you can replicate it.
| TOOL | PURPOSE | HOW IT RUNS | PORT |
|---|---|---|---|
| Ollama | Local LLM text generation | NATIVE | 11434 |
| ComfyUI | Image generation (Flux/SDXL) | NATIVE | 8188 |
| n8n | Workflow automation | DOCKER | 5678 |
[01] Prerequisites
Install these three system-level tools in order before anything else.
Homebrew
# Check if already installed
brew --version
# If not:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Follow the printed instructions to add brew to your PATH
nvm + Node.js v22
brew install nvm
# Add to ~/.zshrc
echo 'export NVM_DIR="$HOME/.nvm"' >> ~/.zshrc
echo '[ -s "/opt/homebrew/opt/nvm/nvm.sh" ] && \. "/opt/homebrew/opt/nvm/nvm.sh"' >> ~/.zshrc
source ~/.zshrc
nvm install 22
nvm use 22
nvm alias default 22
node --version # v22.x.x
Docker Desktop
Download the Apple Silicon version from docker.com. After install:
docker --version
docker compose version
In Docker Desktop → Resources, set Memory to at least 4GB.
Python 3.12
brew install python@3.12
python3.12 --version
[02] External Drive Setup
We'll assume your drive is mounted at /Volumes/AIStudio. Rename any external drive in Finder by clicking its name.
# Create folder structure
mkdir -p /Volumes/AIStudio/{ollama-models,comfyui,n8n-docker/data,outputs/{images,posts,drafts}}
ls /Volumes/AIStudio/
# ollama-models/ comfyui/ n8n-docker/ outputs/
Tell Ollama to store models on the external drive before installing it:
echo 'export OLLAMA_MODELS="/Volumes/AIStudio/ollama-models"' >> ~/.zshrc
source ~/.zshrc
echo $OLLAMA_MODELS
# /Volumes/AIStudio/ollama-models
[03] Ollama — Local LLM
brew install ollama
brew services start ollama
# Verify
curl http://localhost:11434
# Ollama is running
Pull a model. qwen2.5:14b is excellent for creative writing and runs fast on 24GB RAM (~8.5GB download):
ollama pull qwen2.5:14b
# Optional: smaller/faster model for quick tasks
ollama pull llama3.2:3b
ollama list
Test the API (this is what n8n calls):
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5:14b",
"prompt": "Write a LinkedIn post about productivity.",
"stream": false
}'
"response" field, Ollama is working correctly.
[04] ComfyUI — Image Generation
cd /Volumes/AIStudio/comfyui
git clone https://github.com/comfyanonymous/ComfyUI.git .
python3.12 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install torch torchvision torchaudio
Download Flux.1-schnell
update below:
The best free image model right now. ~5–8 second generations on M4. You need a free HuggingFace account.
pip install huggingface_hub
huggingface-cli login
# Model (~9GB)
hf download black-forest-labs/FLUX.1-schnell \
flux1-schnell.safetensors \
--local-dir /Volumes/AIStudio/comfyui/models/unet/
# VAE (~335MB)
hf download black-forest-labs/FLUX.1-schnell \
ae.safetensors \
--local-dir /Volumes/AIStudio/comfyui/models/vae/
# Text encoders
hf download comfyanonymous/flux_text_encoders \
clip_l.safetensors \
--local-dir /Volumes/AIStudio/comfyui/models/clip/
hf download comfyanonymous/flux_text_encoders \
t5xxl_fp8_e4m3fn.safetensors \
--local-dir /Volumes/AIStudio/comfyui/models/clip/
Launch script
Save as /Volumes/AIStudio/comfyui/start.sh:
#!/bin/bash
cd /Volumes/AIStudio/comfyui
source venv/bin/activate
python main.py --listen 127.0.0.1 --port 8188
chmod +x /Volumes/AIStudio/comfyui/start.sh
/Volumes/AIStudio/comfyui/start.sh
Open http://127.0.0.1:8188 — you should see the ComfyUI node editor. Check Activity Monitor → GPU while generating to confirm Metal is being used.
[05] n8n — Workflow Automation
n8n runs in Docker and reaches Ollama/ComfyUI via host.docker.internal (Docker Desktop's bridge to your Mac host).
Save this to /Volumes/AIStudio/n8n-docker/docker-compose.yml:
services:
n8n:
image: docker.n8n.io/n8nio/n8n:latest
restart: unless-stopped
ports:
- "5678:5678"
environment:
- N8N_HOST=localhost
- N8N_PORT=5678
- N8N_PROTOCOL=http
- WEBHOOK_URL=http://localhost:5678/
- GENERIC_TIMEZONE=America/Denver
- N8N_SECURE_COOKIE=false
volumes:
- /Volumes/AIStudio/n8n-docker/data:/home/node/.n8n
cd /Volumes/AIStudio/n8n-docker
docker compose up -d
docker compose logs -f
# Watch for: "Editor is now accessible via: http://localhost:5678"
Open http://localhost:5678 and create a local account.
[06] Wire Everything Together
n8n → Ollama
Settings → Credentials → New → search Ollama. Base URL:
http://host.docker.internal:11434
n8n → ComfyUI
No native node — use HTTP Request. Base URL: http://host.docker.internal:8188
POST /prompt # submit a generation job
GET /history/{id} # check status
GET /view?filename= # download finished image
[07] Startup Script
Save as ~/start-ai-studio.sh to bring the whole stack up at once:
#!/bin/bash
brew services start ollama
docker compose -f /Volumes/AIStudio/n8n-docker/docker-compose.yml up -d
echo "Ollama: http://localhost:11434"
echo "n8n: http://localhost:5678"
echo "ComfyUI: /Volumes/AIStudio/comfyui/start.sh"
Troubleshooting
- Always confirm the drive is mounted first —
ls /Volumes/AIStudiobefore starting anything. - n8n can't reach Ollama/ComfyUI — use
host.docker.internal, notlocalhost, in n8n HTTP Request nodes. - ComfyUI not using GPU — confirm venv is active, then run:
python -c "import torch; print(torch.backends.mps.is_available())"— must printTrue. - Model quality vs speed —
qwen2.5:14bfor best writing (~3–5s/response),llama3.2:3bfor quick tasks, Flux.1-schnell for images (~5–8s).
UPDATE:
Upon executing this full tutorial I realized that FLUX.1-schnell is about 24G and that is not going to work with a 24G Mac, especially with the OS needing ~4-6GB just to run, there basically is no headroom.
The fix: Download the GGUF version.
Model Version || Size || Works on 24GB MAC?
Flux schnell BF16 || ~23.8GB || ❌ No — fills RAM, forces disk swap
Flux schnell FP8 || ~11.9GB || ✅ Yes — good quality
Flux schnell GGUF Q8 || ~12GB || ✅ Yes — best option for this setup
Flux schnell GGUF Q4_K_S || ~6.7GB || ✅ Yes — fastest, slightly lower quality
Switching to GGUF Q8
GGUF is a quantized format designed exactly for running large models in constrained memory. Q8 gives you near-identical quality to the full model at half the size. This is genuinely the right format for Apple Silicon.
Step 1 - Delete the current model (saves room & we won't be using it):
rm /Volumes/AIStudio/comfyui/models/unet/flux1-schnell.safetensors
# This frees up 23GB
Step 2 - Install teh ComfyUI-GGUF custom node:
cd /Volumes/AIStudio/comfyui/custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF.git
cd ComfyUI-GGUF
source /Volumes/AIStudio/comfyui/venv/bin/activate
pip install -r requirements.txt
Step 3 - Download the GGUF Q8 model (~12GB):
hf download city96/FLUX.1-schnell-gguf \
flux1-schnell-Q8_0.gguf \
--local-dir /Volumes/AIStudio/comfyui/models/unet/
Step 4 - Restart ComfyUI
After you restart ComfyUI, then in the workflow swap the UNETLoader node for a UnetLoaderGGUF node and point it at flux1-schnell-Q8_0.gguf.
Once you do this, generation should drop from 45 minutes to under 60 seconds. That's the difference between running on CPU with a swapped-out model vs running properly on your M4 GPU.
Also worth updating the n8n workflow JSON — once you confirm image gen is working, we should update the UNETLoader node to the GGUF version. Want me to regenerate that workflow JSON with the GGUF node swapped in?
// no comments on this post.