I've been running a fully local AI stack on my M4 Mac Mini — Ollama for text generation, ComfyUI for images, and n8n to wire it all together. Here's the complete setup guide so you can replicate it.

// STACK OVERVIEW
TOOL PURPOSE HOW IT RUNS PORT
Ollama Local LLM text generation NATIVE 11434
ComfyUI Image generation (Flux/SDXL) NATIVE 8188
n8n Workflow automation DOCKER 5678
Why Ollama and ComfyUI can't run in Docker on Mac: Docker containers can't access the Apple Silicon GPU. Running them in Docker forces CPU-only mode — roughly 5–10x slower. Run natively to get Metal GPU acceleration.

[01] Prerequisites

Install these three system-level tools in order before anything else.

Homebrew

# Check if already installed
brew --version

# If not:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Follow the printed instructions to add brew to your PATH

nvm + Node.js v22

brew install nvm

# Add to ~/.zshrc
echo 'export NVM_DIR="$HOME/.nvm"' >> ~/.zshrc
echo '[ -s "/opt/homebrew/opt/nvm/nvm.sh" ] && \. "/opt/homebrew/opt/nvm/nvm.sh"' >> ~/.zshrc
source ~/.zshrc

nvm install 22
nvm use 22
nvm alias default 22
node --version   # v22.x.x

Docker Desktop

Download the Apple Silicon version from docker.com. After install:

docker --version
docker compose version

In Docker Desktop → Resources, set Memory to at least 4GB.

Python 3.12

brew install python@3.12
python3.12 --version

[02] External Drive Setup

Drive speed matters. Use a Thunderbolt 3/4 NVMe SSD. The M4 Mac Mini has two Thunderbolt 4 ports. A Samsung T9 or SanDisk Extreme Pro (~$80–100) is ideal. USB-A will be too slow.

We'll assume your drive is mounted at /Volumes/AIStudio. Rename any external drive in Finder by clicking its name.

# Create folder structure
mkdir -p /Volumes/AIStudio/{ollama-models,comfyui,n8n-docker/data,outputs/{images,posts,drafts}}

ls /Volumes/AIStudio/
# ollama-models/   comfyui/   n8n-docker/   outputs/

Tell Ollama to store models on the external drive before installing it:

echo 'export OLLAMA_MODELS="/Volumes/AIStudio/ollama-models"' >> ~/.zshrc
source ~/.zshrc
echo $OLLAMA_MODELS
# /Volumes/AIStudio/ollama-models

[03] Ollama — Local LLM

brew install ollama
brew services start ollama

# Verify
curl http://localhost:11434
# Ollama is running

Pull a model. qwen2.5:14b is excellent for creative writing and runs fast on 24GB RAM (~8.5GB download):

ollama pull qwen2.5:14b

# Optional: smaller/faster model for quick tasks
ollama pull llama3.2:3b

ollama list

Test the API (this is what n8n calls):

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5:14b",
  "prompt": "Write a LinkedIn post about productivity.",
  "stream": false
}'
If you get back JSON with a "response" field, Ollama is working correctly.

[04] ComfyUI — Image Generation

cd /Volumes/AIStudio/comfyui
git clone https://github.com/comfyanonymous/ComfyUI.git .

python3.12 -m venv venv
source venv/bin/activate

pip install -r requirements.txt
pip install torch torchvision torchaudio

Download Flux.1-schnell

update below:

The best free image model right now. ~5–8 second generations on M4. You need a free HuggingFace account.

pip install huggingface_hub
huggingface-cli login

# Model (~9GB)
hf download black-forest-labs/FLUX.1-schnell \
  flux1-schnell.safetensors \
  --local-dir /Volumes/AIStudio/comfyui/models/unet/

# VAE (~335MB)
hf download black-forest-labs/FLUX.1-schnell \
  ae.safetensors \
  --local-dir /Volumes/AIStudio/comfyui/models/vae/

# Text encoders
hf download comfyanonymous/flux_text_encoders \
  clip_l.safetensors \
  --local-dir /Volumes/AIStudio/comfyui/models/clip/

hf download comfyanonymous/flux_text_encoders \
  t5xxl_fp8_e4m3fn.safetensors \
  --local-dir /Volumes/AIStudio/comfyui/models/clip/

Launch script

Save as /Volumes/AIStudio/comfyui/start.sh:

#!/bin/bash
cd /Volumes/AIStudio/comfyui
source venv/bin/activate
python main.py --listen 127.0.0.1 --port 8188
chmod +x /Volumes/AIStudio/comfyui/start.sh
/Volumes/AIStudio/comfyui/start.sh

Open http://127.0.0.1:8188 — you should see the ComfyUI node editor. Check Activity Monitor → GPU while generating to confirm Metal is being used.

[05] n8n — Workflow Automation

n8n runs in Docker and reaches Ollama/ComfyUI via host.docker.internal (Docker Desktop's bridge to your Mac host).

Save this to /Volumes/AIStudio/n8n-docker/docker-compose.yml:

services:
  n8n:
    image: docker.n8n.io/n8nio/n8n:latest
    restart: unless-stopped
    ports:
      - "5678:5678"
    environment:
      - N8N_HOST=localhost
      - N8N_PORT=5678
      - N8N_PROTOCOL=http
      - WEBHOOK_URL=http://localhost:5678/
      - GENERIC_TIMEZONE=America/Denver
      - N8N_SECURE_COOKIE=false
    volumes:
      - /Volumes/AIStudio/n8n-docker/data:/home/node/.n8n
cd /Volumes/AIStudio/n8n-docker
docker compose up -d
docker compose logs -f
# Watch for: "Editor is now accessible via: http://localhost:5678"

Open http://localhost:5678 and create a local account.

[06] Wire Everything Together

n8n → Ollama

Settings → Credentials → New → search Ollama. Base URL:

http://host.docker.internal:11434

n8n → ComfyUI

No native node — use HTTP Request. Base URL: http://host.docker.internal:8188

POST /prompt          # submit a generation job
GET  /history/{id}    # check status
GET  /view?filename=  # download finished image

[07] Startup Script

Save as ~/start-ai-studio.sh to bring the whole stack up at once:

#!/bin/bash
brew services start ollama
docker compose -f /Volumes/AIStudio/n8n-docker/docker-compose.yml up -d
echo "Ollama:   http://localhost:11434"
echo "n8n:      http://localhost:5678"
echo "ComfyUI:  /Volumes/AIStudio/comfyui/start.sh"

Troubleshooting

  • Always confirm the drive is mounted firstls /Volumes/AIStudio before starting anything.
  • n8n can't reach Ollama/ComfyUI — use host.docker.internal, not localhost, in n8n HTTP Request nodes.
  • ComfyUI not using GPU — confirm venv is active, then run: python -c "import torch; print(torch.backends.mps.is_available())" — must print True.
  • Model quality vs speedqwen2.5:14b for best writing (~3–5s/response), llama3.2:3b for quick tasks, Flux.1-schnell for images (~5–8s).

UPDATE:

Upon executing this full tutorial I realized that FLUX.1-schnell is about 24G and that is not going to work with a 24G Mac, especially with the OS needing ~4-6GB just to run, there basically is no headroom.
The fix: Download the GGUF version.


Model Version || Size || Works on 24GB MAC?
Flux schnell BF16 || ~23.8GB || ❌ No — fills RAM, forces disk swap
Flux schnell FP8 || ~11.9GB || ✅ Yes — good quality
Flux schnell GGUF Q8 || ~12GB || ✅ Yes — best option for this setup
Flux schnell GGUF Q4_K_S || ~6.7GB || ✅ Yes — fastest, slightly lower quality

Switching to GGUF Q8

GGUF is a quantized format designed exactly for running large models in constrained memory. Q8 gives you near-identical quality to the full model at half the size. This is genuinely the right format for Apple Silicon.

Step 1 - Delete the current model (saves room & we won't be using it):

rm /Volumes/AIStudio/comfyui/models/unet/flux1-schnell.safetensors
# This frees up 23GB

Step 2 - Install teh ComfyUI-GGUF custom node:

cd /Volumes/AIStudio/comfyui/custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF.git
cd ComfyUI-GGUF
source /Volumes/AIStudio/comfyui/venv/bin/activate
pip install -r requirements.txt

Step 3 - Download the GGUF Q8 model (~12GB):

hf download city96/FLUX.1-schnell-gguf \
  flux1-schnell-Q8_0.gguf \
  --local-dir /Volumes/AIStudio/comfyui/models/unet/ 

Step 4 - Restart ComfyUI

After you restart ComfyUI, then in the workflow swap the UNETLoader node for a UnetLoaderGGUF node and point it at flux1-schnell-Q8_0.gguf.

Once you do this, generation should drop from 45 minutes to under 60 seconds. That's the difference between running on CPU with a swapped-out model vs running properly on your M4 GPU.

Also worth updating the n8n workflow JSON — once you confirm image gen is working, we should update the UNETLoader node to the GGUF version. Want me to regenerate that workflow JSON with the GGUF node swapped in?