How to Build an AI Chatbot on Your Website with Ollama

What Is Ollama and Why Use It for a Web Chatbot

Contents hide

1 What Is Ollama and Why Use It for a Web Chatbot

2 High-Level Architecture

3 Step 1: Install and Run Ollama

4 Step 2: Build the Backend Service

4.1 Example with FastAPI

5 Step 3: Create the Website Chat Frontend

5.1 Example HTML + JS

6 Step 4: (Optional) Add Retrieval-Augmented Generation (RAG)

7 Step 5: Deployment Considerations

8 Step 6: Improving the Chatbot

9 Example Use Case

10 Summary

Ollama lets you run LLMs locally on your machine or server. Instead of calling a cloud API, you can run the model via Ollama’s local server or API, giving you privacy, lower cost, and possibly faster responses. You can integrate an LLM running on Ollama with your website by building a simple backend service that takes user messages, forwards them to Ollama, and returns the LLM’s response to the frontend chat interface.

High-Level Architecture

Ollama runtime: Running a local LLM (e.g., Mistral, Llama) via ollama serve or similar.
Backend server: A web server (e.g., Python + FastAPI) that accepts chat messages, calls your Ollama LLM, and returns responses.
Frontend chat widget: A simple HTML/CSS/JS chat UI embedded in your website.
Optional: Retrieval layer (vector store) for RAG if you want your chatbot to use custom documents/knowledge base.

Step 1: Install and Run Ollama

Install Ollama on your machine following its installation instructions. Medium+1
Use the CLI to pull a model:

ollama pull mistral:latest
Run the model as a local server:

ollama serve

By default, Ollama will serve on a local HTTP port (for example). Medium+1

Step 2: Build the Backend Service

Use Python and FastAPI (or another backend framework) to act as a bridge between your website and Ollama.

Example with FastAPI

from fastapi import FastAPI

from pydantic import BaseModel

import requestsapp = FastAPI()

OLLAMA_API_URL = “http://localhost:11434/api/generate”

class ChatRequest(BaseModel):
message: str

@app.post(“/chat”)
def chat(req: ChatRequest):
payload = {
“model”: “mistral:latest”,
“prompt”: req.message,
“stream”: False
}
resp = requests.post(OLLAMA_API_URL, json=payload)
data = resp.json()
answer = data. get(“choices”, [{}])[0]. get(“text”, “”)
return {“reply”: answer}

This service accepts a POST request /chat with a JSON body { "message": "..." }.
It sends the prompt to your running Ollama instance and returns the LLM’s response.

Step 3: Create the Website Chat Frontend

You can build a simple HTML + JavaScript chat widget that sends user messages to your FastAPI backend and displays responses.

Example HTML + JS

<!DOCTYPE html>

<html>

<head>

<style>

.chat-container { width: 400px; height: 500px; border: 1px solid #ccc; display: flex; flex-direction: column; }

#chat-box { flex: 1; overflow-y: auto; padding: 10px; }

.message { margin: 5px 0; }

.user { text-align: right; color: blue; }

.bot { text-align: left; color: green; }

#user-input { padding: 10px; }

#send-btn { padding: 10px; }

</style>

</head>

<body>

<div class="chat-container">

<div id="chat-box"></div>

<input type="text" id="user-input" placeholder="Type a message..." />

<button id="send-btn">Send</button>

</div>

<script>

const sendBtn = document.getElementById("send-btn");

const input = document.getElementById("user-input");

const chatBox = document.getElementById("chat-box");sendBtn.onclick = async () => {

function appendMessage(text, sender) {
const div = document.createElement(“div”);
div.className = ‘message ‘ + sender;
div.textContent = text;
chatBox.appendChild(div);
chatBox.scrollTop = chatBox.scrollHeight;
}
</script>
</body>
</html>

This UI is minimal but works: when you click Send, it calls your backend /chat, then shows the response from Ollama.

Step 4: (Optional) Add Retrieval-Augmented Generation (RAG)

If you want the chatbot to answer based on your own documents (like FAQs or manuals), you can add an RAG layer:

Embed your documents: Use an embedding model to convert them into vectors.
Store embeddings: Use a vector database (e.g., Pinecone, Weaviate) or even a simple in-memory store.
On user query:
- Embed the user’s question
- Find top‑k similar document chunks
- Build a prompt that includes those chunks and the user’s question
- Pass to Ollama to generate an answer

You can integrate that in your FastAPI backend; first do a vector search, then call Ollama with context.

Step 5: Deployment Considerations

Where to host:
- If running on your machine for dev, use local serve.
- For production, host the backend and the Ollama serving machine on a VPS or dedicated hardware—ensure you have enough RAM / CPU to run your chosen model.
Security:
- Protect your /chat endpoint (CORS, authentication) so not just anyone hogs your model.
- Rate-limit requests if needed.
Scaling:
- For high load, run multiple Ollama instances (if your machine supports it) or use Docker/containers.
- Cache frequent prompts or responses to reduce latency.

Step 6: Improving the Chatbot

Prompt engineering: Customise the system prompt or instruction to guide the LLM’s personality, style, and behaviour.
Conversation history: Keep a short memory of previous messages in your backend and pass them in the prompt so the chatbot can carry context.
Streaming responses: If Ollama supports streaming, you can stream partial responses to the frontend for a more dynamic experience.
UI polish: Add typing indicators, better styling, and mobile responsiveness.
Logging & analytics: Log all conversations to analyse usage and common user queries and improve your prompt and knowledge base.

Example Use Case

Imagine you run a documentation website and want a chatbot that answers questions about your product’s guide:

Use RAG to pull relevant sections from your guide.
Use FastAPI + Ollama to serve the LLM.
Embed the chat widget into your documentation site.
Visitors ask questions, and the chatbot answers using your own documentation plus LLM generalisation.

Summary

Install Ollama and run a local LLM. Medium
Build a backend (e.g., Python + FastAPI) that takes user input and calls Ollama.
Create a frontend chat widget in HTML/JS that calls your backend and displays responses.
Optionally add a RAG layer using vector embeddings so the chatbot answers from custom documents.
Deploy securely, optimise prompts and UI, and run conversation logs to improve over time.

Author

He is a SaaS-focused writer and the author of Xsone Consultants, sharing insights on digital transformation, cloud solutions, and the evolving SaaS landscape.