Home ✽ local llms

Features

Complete Privacy

All AI models run locally on your hardware. No data is sent to external servers.

Process everything on your machine with complete privacy.

Multi-Platform

Use via browser (web app) or native desktop application with Electron.

Seamless experience across all your devices.

Offline-Capable

Download models once, use them offline indefinitely.

WebGPU mode enables true offline operation.

Web Search Capabilities

Integrate real-time web search to enhance your AI's knowledge and provide up-to-date answers.

Optional web search with Tavily or DuckDuckGo (when enabled).

File Embeddings

Load and ask questions about documents (PDF, MD, DOCX, TXT, CSV, RTF).

Fully local document processing and analysis.

Voice Support

Interact with the AI using voice messages.

Communicate naturally and hands-free.

Regenerate Responses

Quickly regenerate AI responses without retyping prompts.

Refine and iterate easily on your conversations.

Chat History

Persistent, organized conversation history across sessions.

Never lose your important chats and conversations.

Custom Memory

Add custom system prompts and memory to personalize AI behavior.

Make the AI truly yours with custom instructions.

Supported Backends

WebGPU

Run models directly in your browser using GPU acceleration. Native browser-based inference with no installation needed.

Browser • GPU • No setup required

Ollama

Easy model management with Ollama backend. Manage and run models with simple commands across Windows, Mac, and Linux.

Desktop • CPU/GPU • Easy management

llama.cpp

Direct integration with optimized performance. CPU/GPU optimized inference on desktop with direct integration.

Desktop • CPU/GPU • Optimized

Offeline is a powerful, privacy-first AI chat application with both browser and desktop support. Run ai models without internet.