Skip to content

No OpenAI Account, No Problem!

If you want the OpenAI developer experience without the OpenAI account (or tokens), Ollama now exposes an OpenAI-style /v1 endpoint. That means you can point existing clients and frameworks at your local models and ship.


Condensed mini-blog from my piece on crafting your own OpenAI-compatible API with Ollama.

Why this rocks

  • Drop‑in compatibility: Keep using the OpenAI SDKs and patterns.
  • Local first: Your data and prompts stay on your machine.
  • Costs: $0 per token, just your hardware.

1) Spin up Ollama with Docker

CPU only

docker run -d -v /data/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

GPU (NVIDIA)

  1. Install NVIDIA Container Toolkit.
  2. Run with GPU access:
docker run -d --gpus=all -v /data/ollama:/root/.ollama --restart always -p 11434:11434 --name ollama ollama/ollama

Pro tip: Want specific GPUs? Use --gpus "device=0,1".

Where are models stored? In this compose, they’ll live on your host at /data/ollama.

2) Sanity‑check Ollama

Run a model inside the container:

docker exec -it ollama ollama run llama2
# >>> Send a message (/? for help)

Hit the OpenAI‑style chat completions endpoint:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
        "model": "llama2",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user",   "content": "Hello!"}
        ]
      }'

If you get a response with choices[0].message.content, you’re golden.

3) Build a local “OpenAI” chatbot

Requirements

streamlit>=1.28
langchain>=0.0.217
openai>=1.2
duckduckgo-search
anthropic>=0.3.0
trubrics>=1.4.3
streamlit-feedback

Minimal app: Chatbot.py

from openai import OpenAI
import streamlit as st

# This key is required by the SDK but unused by Ollama; leave any string
OPENAI_API_KEY = "ollama-baby"

st.title("Chatbot")
st.caption("A Streamlit chatbot powered by OpenAI API... I mean Ollama!!!")

if "messages" not in st.session_state:
    st.session_state["messages"] = [
        {"role": "assistant", "content": "How can I help you?"}
    ]

# Show history
for msg in st.session_state["messages"]:
    st.chat_message(msg["role"]).write(msg["content"])

# Input
prompt = st.chat_input("Say something…")
if prompt:
    client = OpenAI(
        api_key=OPENAI_API_KEY,
        base_url="http://localhost:11434/v1",  # ← point at Ollama
    )

    st.session_state["messages"].append({"role": "user", "content": prompt})
    st.chat_message("user").write(prompt)

    response = client.chat.completions.create(
        model="llama2",
        messages=st.session_state["messages"],
    )
    msg = response.choices[0].message.content

    st.session_state["messages"].append({"role": "assistant", "content": msg})
    st.chat_message("assistant").write(msg)

Run it:

streamlit run Chatbot.py

Chatbot Still not a standup comedian 😬

Notes

  • You can swap llama2 for any local model you’ve pulled with Ollama.
  • Keep an eye on VRAM/CPU footprints; bigger models need beefier hardware.
  • LangChain, agents, and retrieval components can ride along since the client looks like OpenAI.

Wrap‑up

With Ollama’s /v1 endpoint, you can prototype and ship OpenAI‑compatible apps locally. The DX you know, the privacy you want, and zero token anxiety.


📖 Read the Full Article

Article Preview

No OpenAI Account, No Problem! Crafting Your Own OpenAI API with Ollama 🦙

Explore alternatives to OpenAI's services and learn how to leverage open-source models for your AI projects.

📖 Full article available on Medium