Ollama Provider¶
Local LLMs using Ollama. Best for development and privacy-sensitive workloads.
Installation¶
# Install ollama Python package
pip install dynabots-core[ollama]
# Download and install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Or download from https://ollama.ai
Setup¶
Start Ollama¶
In a terminal:
Ollama listens on http://localhost:11434 by default.
Pull a Model¶
In another terminal:
# Recommended: Qwen for best reasoning and tool use
ollama pull qwen2.5:7b
# Or other models
ollama pull llama3.1:70b
ollama pull mixtral:8x22b
List Models¶
Usage¶
Basic¶
from dynabots_core.providers import OllamaProvider
from dynabots_core import LLMMessage
llm = OllamaProvider(model="qwen2.5:7b")
response = await llm.complete([
LLMMessage(role="system", content="You are helpful."),
LLMMessage(role="user", content="What is 2+2?"),
])
print(response.content) # "4"
Custom Server¶
With Options¶
llm = OllamaProvider(
model="qwen2.5:72b",
options={
"num_gpu": 2, # Use 2 GPUs
"num_ctx": 8192, # Context window
"temperature": 0.7,
}
)
Features¶
Temperature¶
Control randomness:
# Deterministic
response = await llm.complete(messages, temperature=0.0)
# Creative
response = await llm.complete(messages, temperature=0.9)
Max Tokens¶
Limit response length:
JSON Mode¶
Request JSON output:
response = await llm.complete(
messages=[
LLMMessage(role="user", content="Extract name and age: ...")
],
json_mode=True
)
import json
data = json.loads(response.content)
Tool Calling¶
Enable function calling (model-dependent):
from dynabots_core.protocols.llm import ToolDefinition
tools = [
ToolDefinition(
name="search",
description="Search the knowledge base",
parameters={
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
)
]
response = await llm.complete(
messages=[
LLMMessage(role="user", content="Search for Python")
],
tools=tools
)
if response.tool_calls:
for call in response.tool_calls:
print(f"Tool: {call['function']['name']}")
Recommended Models¶
For Agents (Best Overall)¶
- qwen2.5:72b - Excellent reasoning, tool use, code
- qwen2.5:7b - Smaller, faster (testing)
- llama3.1:70b - Strong reasoning, good tool use
For Code¶
- codellama:70b - Best for code generation
- qwen2.5:72b - Also very good for code
Fast & Cheap¶
- mixtral:8x22b - Fast, good quality
- neural-chat:7b - Very fast, decent quality
Long Context¶
- qwen2.5:72b - Handles 8K+ context
- llama3.1:70b - 8K context
Performance Tuning¶
GPU Usage¶
llm = OllamaProvider(
model="qwen2.5:72b",
options={
"num_gpu": -1, # Use all GPUs
"num_ctx": 8192,
}
)
CPU Fallback¶
If no GPU:
llm = OllamaProvider(
model="qwen2.5:7b", # Smaller model
options={
"num_threads": 8, # CPU threads
}
)
Batch Processing¶
For multiple requests:
# Ollama keeps model in memory after first use
response1 = await llm.complete(messages1) # Slow (load model)
response2 = await llm.complete(messages2) # Fast (model cached)
response3 = await llm.complete(messages3) # Fast
Protocol Definition¶
dynabots_core.providers.ollama.OllamaProvider
¶
LLMProvider implementation for Ollama (local LLMs).
Supports: - All Ollama models (Llama, Qwen, Mixtral, CodeLlama, etc.) - JSON mode for structured output - Tool calling (model-dependent) - Custom Ollama server URLs
Example
Basic usage¶
llm = OllamaProvider(model="qwen2.5:72b")
Custom server¶
llm = OllamaProvider( model="llama3.1:70b", host="http://gpu-server:11434" )
With options¶
llm = OllamaProvider( model="qwen2.5:72b", options={"num_gpu": 2, "num_ctx": 8192} )
Source code in packages/core/dynabots_core/providers/ollama.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | |
model
property
¶
Get the current model name.
__init__(model='qwen2.5:72b', host=None, options=None)
¶
Initialize the Ollama provider.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Ollama model name (e.g., "qwen2.5:72b", "llama3.1:70b"). |
'qwen2.5:72b'
|
host
|
Optional[str]
|
Ollama server URL. Defaults to http://localhost:11434. |
None
|
options
|
Optional[Dict[str, Any]]
|
Additional Ollama options (num_gpu, num_ctx, etc.). |
None
|
Source code in packages/core/dynabots_core/providers/ollama.py
complete(messages, temperature=0.1, max_tokens=2000, json_mode=False, tools=None)
async
¶
Send messages to Ollama and get a response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
List[LLMMessage]
|
Conversation messages. |
required |
temperature
|
float
|
Sampling temperature (0.0-1.0). |
0.1
|
max_tokens
|
int
|
Maximum response tokens. |
2000
|
json_mode
|
bool
|
If True, request JSON-formatted output. |
False
|
tools
|
Optional[List[ToolDefinition]]
|
Optional list of tools (requires compatible model). |
None
|
Returns:
| Type | Description |
|---|---|
LLMResponse
|
LLMResponse with the model's output. |
Source code in packages/core/dynabots_core/providers/ollama.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
list_models()
async
¶
List available models on the Ollama server.
Source code in packages/core/dynabots_core/providers/ollama.py
Helper Methods¶
List Models¶
Pull Model¶
Common Issues¶
Connection Refused¶
Solution: Start Ollama service:
Model Not Found¶
Solution: Pull the model:
Out of Memory¶
Solution: Use a smaller model or increase context timeout:
Slow Responses¶
Cause: Model running on CPU
Solution: Check GPU usage:
Install GPU support or use smaller model.
Docker¶
Run Ollama in Docker:
Then pull models:
Use with custom host: