Deploying Ollama for HG Content¶

This guide covers deploying a persistent Ollama service suitable for the free local/open‑source provider option.

Local Development¶

Install Ollama: https://ollama.com/
Run locally: ollama run mistral
Configure env in .env:
OLLAMA_ENABLED=true
OLLAMA_URL=http://localhost:11434
OLLAMA_MODEL=mistral:latest

Railway (CPU) — Sample Setup¶

1) Files - deployment/ollama/Dockerfile - deployment/ollama/entrypoint.sh (pulls model if missing) - deployment/ollama/railway.sample.toml (volume for models)

2) Environment - OLLAMA_MODEL=mistral:latest (or a quantized variant like mistral:7b-instruct-q4_K_M) - Ensure a volume is attached at /root/.ollama to persist models

3) Deploy - Create a new Railway service from the Dockerfile - Attach a volume (>= 20–50 GiB depending on models) - Start the service; first boot will pull the model if needed

4) Configure app - Set OLLAMA_URL to the Railway service URL (e.g. https://ollama-service.up.railway.app) - Optionally set OLLAMA_ENABLED=true for direct frontend use

Notes¶

CPU inference is slower; consider quantized models (q4_K_M) and lower max tokens.
For GPU, select an appropriate machine and a non‑quantized model; adjust concurrency as needed.
Never expose the Ollama service publicly without access controls; prefer private networking.