Skip to content

Ollama Provider

Local LLMs using Ollama. Best for development and privacy-sensitive workloads.


Installation

# Install ollama Python package
pip install dynabots-core[ollama]

# Download and install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai

Setup

Start Ollama

In a terminal:

ollama serve

Ollama listens on http://localhost:11434 by default.

Pull a Model

In another terminal:

# Recommended: Qwen for best reasoning and tool use
ollama pull qwen2.5:7b

# Or other models
ollama pull llama3.1:70b
ollama pull mixtral:8x22b

List Models

ollama list

Usage

Basic

from dynabots_core.providers import OllamaProvider
from dynabots_core import LLMMessage

llm = OllamaProvider(model="qwen2.5:7b")

response = await llm.complete([
    LLMMessage(role="system", content="You are helpful."),
    LLMMessage(role="user", content="What is 2+2?"),
])

print(response.content)  # "4"

Custom Server

llm = OllamaProvider(
    model="qwen2.5:72b",
    host="http://gpu-server:11434"  # Custom host
)

With Options

llm = OllamaProvider(
    model="qwen2.5:72b",
    options={
        "num_gpu": 2,      # Use 2 GPUs
        "num_ctx": 8192,   # Context window
        "temperature": 0.7,
    }
)

Features

Temperature

Control randomness:

# Deterministic
response = await llm.complete(messages, temperature=0.0)

# Creative
response = await llm.complete(messages, temperature=0.9)

Max Tokens

Limit response length:

response = await llm.complete(
    messages,
    max_tokens=500
)

JSON Mode

Request JSON output:

response = await llm.complete(
    messages=[
        LLMMessage(role="user", content="Extract name and age: ...")
    ],
    json_mode=True
)

import json
data = json.loads(response.content)

Tool Calling

Enable function calling (model-dependent):

from dynabots_core.protocols.llm import ToolDefinition

tools = [
    ToolDefinition(
        name="search",
        description="Search the knowledge base",
        parameters={
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            },
            "required": ["query"]
        }
    )
]

response = await llm.complete(
    messages=[
        LLMMessage(role="user", content="Search for Python")
    ],
    tools=tools
)

if response.tool_calls:
    for call in response.tool_calls:
        print(f"Tool: {call['function']['name']}")

For Agents (Best Overall)

  • qwen2.5:72b - Excellent reasoning, tool use, code
  • qwen2.5:7b - Smaller, faster (testing)
  • llama3.1:70b - Strong reasoning, good tool use

For Code

  • codellama:70b - Best for code generation
  • qwen2.5:72b - Also very good for code

Fast & Cheap

  • mixtral:8x22b - Fast, good quality
  • neural-chat:7b - Very fast, decent quality

Long Context

  • qwen2.5:72b - Handles 8K+ context
  • llama3.1:70b - 8K context

Performance Tuning

GPU Usage

llm = OllamaProvider(
    model="qwen2.5:72b",
    options={
        "num_gpu": -1,  # Use all GPUs
        "num_ctx": 8192,
    }
)

CPU Fallback

If no GPU:

llm = OllamaProvider(
    model="qwen2.5:7b",  # Smaller model
    options={
        "num_threads": 8,  # CPU threads
    }
)

Batch Processing

For multiple requests:

# Ollama keeps model in memory after first use
response1 = await llm.complete(messages1)  # Slow (load model)
response2 = await llm.complete(messages2)  # Fast (model cached)
response3 = await llm.complete(messages3)  # Fast

Protocol Definition

dynabots_core.providers.ollama.OllamaProvider

LLMProvider implementation for Ollama (local LLMs).

Supports: - All Ollama models (Llama, Qwen, Mixtral, CodeLlama, etc.) - JSON mode for structured output - Tool calling (model-dependent) - Custom Ollama server URLs

Example

Basic usage

llm = OllamaProvider(model="qwen2.5:72b")

Custom server

llm = OllamaProvider( model="llama3.1:70b", host="http://gpu-server:11434" )

With options

llm = OllamaProvider( model="qwen2.5:72b", options={"num_gpu": 2, "num_ctx": 8192} )

Source code in packages/core/dynabots_core/providers/ollama.py
class OllamaProvider:
    """
    LLMProvider implementation for Ollama (local LLMs).

    Supports:
    - All Ollama models (Llama, Qwen, Mixtral, CodeLlama, etc.)
    - JSON mode for structured output
    - Tool calling (model-dependent)
    - Custom Ollama server URLs

    Example:
        # Basic usage
        llm = OllamaProvider(model="qwen2.5:72b")

        # Custom server
        llm = OllamaProvider(
            model="llama3.1:70b",
            host="http://gpu-server:11434"
        )

        # With options
        llm = OllamaProvider(
            model="qwen2.5:72b",
            options={"num_gpu": 2, "num_ctx": 8192}
        )
    """

    def __init__(
        self,
        model: str = "qwen2.5:72b",
        host: Optional[str] = None,
        options: Optional[Dict[str, Any]] = None,
    ) -> None:
        """
        Initialize the Ollama provider.

        Args:
            model: Ollama model name (e.g., "qwen2.5:72b", "llama3.1:70b").
            host: Ollama server URL. Defaults to http://localhost:11434.
            options: Additional Ollama options (num_gpu, num_ctx, etc.).
        """
        try:
            import ollama
        except ImportError:
            raise ImportError(
                "ollama package not installed. Install with: pip install ollama"
            )

        self._model = model
        self._host = host
        self._options = options or {}
        self._client = ollama.AsyncClient(host=host) if host else ollama.AsyncClient()

    async def complete(
        self,
        messages: List[LLMMessage],
        temperature: float = 0.1,
        max_tokens: int = 2000,
        json_mode: bool = False,
        tools: Optional[List[ToolDefinition]] = None,
    ) -> LLMResponse:
        """
        Send messages to Ollama and get a response.

        Args:
            messages: Conversation messages.
            temperature: Sampling temperature (0.0-1.0).
            max_tokens: Maximum response tokens.
            json_mode: If True, request JSON-formatted output.
            tools: Optional list of tools (requires compatible model).

        Returns:
            LLMResponse with the model's output.
        """
        # Convert messages to Ollama format
        ollama_messages = [
            {"role": m.role, "content": m.content}
            for m in messages
        ]

        # Build options
        options = {
            **self._options,
            "temperature": temperature,
            "num_predict": max_tokens,
        }

        # Build request kwargs
        kwargs: Dict[str, Any] = {
            "model": self._model,
            "messages": ollama_messages,
            "options": options,
        }

        # JSON mode
        if json_mode:
            kwargs["format"] = "json"

        # Tool calling (if supported by model)
        if tools:
            kwargs["tools"] = [
                {
                    "type": "function",
                    "function": {
                        "name": t.name,
                        "description": t.description,
                        "parameters": t.parameters,
                    },
                }
                for t in tools
            ]

        # Make the request
        response = await self._client.chat(**kwargs)

        # Handle both dict and Pydantic response objects
        msg = (
            response.message if hasattr(response, "message")
            else response.get("message", {})
        )
        content = (
            msg.content if hasattr(msg, "content")
            else msg.get("content", "")
        )

        # Extract tool calls if present
        tool_calls = None
        msg_tools = (
            getattr(msg, "tool_calls", None) or
            (msg.get("tool_calls") if isinstance(msg, dict) else None)
        )
        if msg_tools:
            tool_calls = msg_tools

        # Extract usage
        prompt_tokens = (
            getattr(response, "prompt_eval_count", 0) or
            (response.get("prompt_eval_count", 0)
             if isinstance(response, dict) else 0)
        )
        completion_tokens = (
            getattr(response, "eval_count", 0) or
            (response.get("eval_count", 0)
             if isinstance(response, dict) else 0)
        )

        return LLMResponse(
            content=content,
            model=self._model,
            tool_calls=tool_calls,
            usage={
                "prompt_tokens": prompt_tokens,
                "completion_tokens": completion_tokens,
                "total_tokens": prompt_tokens + completion_tokens,
            },
        )

    async def list_models(self) -> List[str]:
        """List available models on the Ollama server."""
        response = await self._client.list()
        models = response.models if hasattr(response, "models") else response.get("models", [])
        return [
            m.model if hasattr(m, "model") else m.get("name", m.get("model", str(m)))
            for m in models
        ]

    async def pull_model(self, model: str) -> None:
        """Pull a model from the Ollama library."""
        await self._client.pull(model)

    @property
    def model(self) -> str:
        """Get the current model name."""
        return self._model

model property

Get the current model name.

__init__(model='qwen2.5:72b', host=None, options=None)

Initialize the Ollama provider.

Parameters:

Name Type Description Default
model str

Ollama model name (e.g., "qwen2.5:72b", "llama3.1:70b").

'qwen2.5:72b'
host Optional[str]

Ollama server URL. Defaults to http://localhost:11434.

None
options Optional[Dict[str, Any]]

Additional Ollama options (num_gpu, num_ctx, etc.).

None
Source code in packages/core/dynabots_core/providers/ollama.py
def __init__(
    self,
    model: str = "qwen2.5:72b",
    host: Optional[str] = None,
    options: Optional[Dict[str, Any]] = None,
) -> None:
    """
    Initialize the Ollama provider.

    Args:
        model: Ollama model name (e.g., "qwen2.5:72b", "llama3.1:70b").
        host: Ollama server URL. Defaults to http://localhost:11434.
        options: Additional Ollama options (num_gpu, num_ctx, etc.).
    """
    try:
        import ollama
    except ImportError:
        raise ImportError(
            "ollama package not installed. Install with: pip install ollama"
        )

    self._model = model
    self._host = host
    self._options = options or {}
    self._client = ollama.AsyncClient(host=host) if host else ollama.AsyncClient()

complete(messages, temperature=0.1, max_tokens=2000, json_mode=False, tools=None) async

Send messages to Ollama and get a response.

Parameters:

Name Type Description Default
messages List[LLMMessage]

Conversation messages.

required
temperature float

Sampling temperature (0.0-1.0).

0.1
max_tokens int

Maximum response tokens.

2000
json_mode bool

If True, request JSON-formatted output.

False
tools Optional[List[ToolDefinition]]

Optional list of tools (requires compatible model).

None

Returns:

Type Description
LLMResponse

LLMResponse with the model's output.

Source code in packages/core/dynabots_core/providers/ollama.py
async def complete(
    self,
    messages: List[LLMMessage],
    temperature: float = 0.1,
    max_tokens: int = 2000,
    json_mode: bool = False,
    tools: Optional[List[ToolDefinition]] = None,
) -> LLMResponse:
    """
    Send messages to Ollama and get a response.

    Args:
        messages: Conversation messages.
        temperature: Sampling temperature (0.0-1.0).
        max_tokens: Maximum response tokens.
        json_mode: If True, request JSON-formatted output.
        tools: Optional list of tools (requires compatible model).

    Returns:
        LLMResponse with the model's output.
    """
    # Convert messages to Ollama format
    ollama_messages = [
        {"role": m.role, "content": m.content}
        for m in messages
    ]

    # Build options
    options = {
        **self._options,
        "temperature": temperature,
        "num_predict": max_tokens,
    }

    # Build request kwargs
    kwargs: Dict[str, Any] = {
        "model": self._model,
        "messages": ollama_messages,
        "options": options,
    }

    # JSON mode
    if json_mode:
        kwargs["format"] = "json"

    # Tool calling (if supported by model)
    if tools:
        kwargs["tools"] = [
            {
                "type": "function",
                "function": {
                    "name": t.name,
                    "description": t.description,
                    "parameters": t.parameters,
                },
            }
            for t in tools
        ]

    # Make the request
    response = await self._client.chat(**kwargs)

    # Handle both dict and Pydantic response objects
    msg = (
        response.message if hasattr(response, "message")
        else response.get("message", {})
    )
    content = (
        msg.content if hasattr(msg, "content")
        else msg.get("content", "")
    )

    # Extract tool calls if present
    tool_calls = None
    msg_tools = (
        getattr(msg, "tool_calls", None) or
        (msg.get("tool_calls") if isinstance(msg, dict) else None)
    )
    if msg_tools:
        tool_calls = msg_tools

    # Extract usage
    prompt_tokens = (
        getattr(response, "prompt_eval_count", 0) or
        (response.get("prompt_eval_count", 0)
         if isinstance(response, dict) else 0)
    )
    completion_tokens = (
        getattr(response, "eval_count", 0) or
        (response.get("eval_count", 0)
         if isinstance(response, dict) else 0)
    )

    return LLMResponse(
        content=content,
        model=self._model,
        tool_calls=tool_calls,
        usage={
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "total_tokens": prompt_tokens + completion_tokens,
        },
    )

list_models() async

List available models on the Ollama server.

Source code in packages/core/dynabots_core/providers/ollama.py
async def list_models(self) -> List[str]:
    """List available models on the Ollama server."""
    response = await self._client.list()
    models = response.models if hasattr(response, "models") else response.get("models", [])
    return [
        m.model if hasattr(m, "model") else m.get("name", m.get("model", str(m)))
        for m in models
    ]

pull_model(model) async

Pull a model from the Ollama library.

Source code in packages/core/dynabots_core/providers/ollama.py
async def pull_model(self, model: str) -> None:
    """Pull a model from the Ollama library."""
    await self._client.pull(model)

Helper Methods

List Models

models = await llm.list_models()
print(models)  # ["qwen2.5:7b", "llama3.1:70b", ...]

Pull Model

await llm.pull_model("qwen2.5:72b")

Common Issues

Connection Refused

Error: Failed to connect to Ollama

Solution: Start Ollama service:

ollama serve

Model Not Found

Error: model qwen2.5:7b not found

Solution: Pull the model:

ollama pull qwen2.5:7b

Out of Memory

Error: CUDA out of memory

Solution: Use a smaller model or increase context timeout:

llm = OllamaProvider(
    model="qwen2.5:7b",  # Smaller
    options={"num_gpu": 1}  # Fewer GPUs
)

Slow Responses

Cause: Model running on CPU

Solution: Check GPU usage:

ollama ps  # See running models

Install GPU support or use smaller model.


Docker

Run Ollama in Docker:

docker run -d --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama

Then pull models:

docker exec <container> ollama pull qwen2.5:7b

Use with custom host:

llm = OllamaProvider(
    model="qwen2.5:7b",
    host="http://docker-host:11434"
)

See Also