Continue.dev

open-source AI IDE extension · by Continue · official site

What it does

Continue is an open-source AI code assistant extension that runs inside VS Code and JetBrains IDEs. Unlike proprietary alternatives, it does not bundle a dedicated LLM. Instead, it exposes a configurable interface to attach any language model backend — local (e.g., llama.cpp, ollama, vLLM), cloud (OpenAI, Anthropic, Google, Groq), or self-hosted. The core functionality covers tab-autocomplete (infill), inline code editing, multi-turn chat with code context, and slash-commands for actions like refactoring, explaining, or generating tests. A key feature is the ability to define custom "context providers" (e.g., current file, git diff, terminal output, lint errors) that are injected into prompts, enabling more relevant responses without manual copy-pasting.

As of mid-2026, the extension also includes a local agent mode that can run simple terminal commands (within a sandbox) and edit multiple files across a workspace, though this remains experimental and model-dependent. The JetBrains build supports IntelliJ, PyCharm, and GoLand, but lags behind the VS Code version in feature parity.

Who it's for

Continue targets developers who want control over their AI stack. Typical users include:

Teams that require data privacy: all prompts and code can stay on-premise via local models or a self-hosted endpoint.
Enthusiasts running open-weight models (Mistral, Llama 3, DeepSeek Coder) who want an IDE-native interface.
Power users who need custom context pipelines (e.g., injecting your own documentation, architecture diagrams, or Jira tickets) beyond what Copilot or Cursor expose.
Cost-sensitive individuals or small teams who already have API keys from a provider (e.g., OpenAI, Anthropic) and want to avoid per-seat subscriptions by paying only for token usage.

It is less suited for users who expect a zero-configuration, polished out-of-box experience, or who work exclusively in languages with weak model support (e.g., VB6, COBOL).

What works

Flexibility of backends is the standout win. You can switch between a cheap local model for autocomplete and a powerful cloud model for complex chat on the fly. The extension respects your chosen context limits, temperature, and system prompts. Tab-autocomplete with modern open-weight models (e.g., DeepSeek Coder V2, CodeGemma, Stable Code 3B) is competitive with Copilot in Python, JavaScript, TypeScript, and Go. Latency is acceptable (300–800ms) on a decent GPU or with a well-optimized endpoint. For local models, the Continue dev team maintains a recommended list, and the setup wizard helps with ollama and LM Studio. Chat with context works reliably: you can highlight code, ask questions, and get meaningful edits. The @ symbol in the chat input lets you reference files, issues, or web search results if you enable a context provider. Custom slash commands are straightforward to write in TypeScript and can automate repetitive tasks (e.g., "add JSDoc to this function"). Offline mode is fully functional (provided you have a local model). No internet connection required once the model is downloaded. This makes Continue a strong choice for air-gapped environments or developers with intermittent connectivity.

What breaks

Setup friction is the biggest downside. First-time users must install the extension, choose a model (or API key), and often tweak config.json to get reasonable autocomplete latency. JetBrains users in particular report that the initial model detection and path configuration can be brittle — the extension sometimes fails to detect an existing ollama installation. Autocomplete quality varies wildly by model and system hardware. Small models (under 2B parameters) produce too many garbage suggestions. Large local models (7B+) require a good GPU or will feel laggy. The "best" default open-weight model changes every few months, and the documentation can become stale. Context window management is less polished than Cursor. Long conversations accumulate tokens without automatic summarization; you must manually clear history or set a limit. The extension does not yet handle cross-file refactoring with full confidence — multi-edit suggestions often require manual adjustment. JetBrains support is still second-class. Features such as inline diffs, chat history persistence, and custom context providers are missing or buggier than their VS Code counterparts. The plugin is officially supported but receives fewer updates.

Pricing reality

The extension itself is open-source (Apache 2.0) and free for any use — personal, commercial, or enterprise. You pay only for the models you run.

Local models: cost = hardware (GPU rental or purchase). Monthly inference costs can range from $0 (CPU-only with small models) to hundreds of dollars for cloud GPU instances.
Cloud API keys: cost = per-token pricing from your chosen provider. OpenAI GPT-4o (~$5/1M in, $15/1M out), Claude 3.5 Sonnet (~$3/1M in, $15/1M out), Google Gemini 1.5 Pro (~$3.50/1M in, $10.50/1M out). For a moderately active developer, expect $20–$50/month in API costs.
Continue Hub (optional hosted service): free tier includes 500 autocomplete calls/day and 100 chat messages/day using Continue's managed models (DeepSeek Coder V2, Qwen2.5-Coder). Paid tiers start at $20/user/month for higher limits, priority queue, and team management. Exact pricing varies by region and contract length.
Enterprise deployments: self-hosted hub pricing is available upon request (varies). No public price list as of June 2026.

There are no hidden fees, no mandatory subscriptions. You can run the extension entirely offline and pay nothing.

Honest comparison

| Feature | Continue (OSS) | GitHub Copilot | Cursor | |---|---|---|---| | Base cost | Free extension + API/hardware costs | $10–$39/user/month | $20–$40/user/month | | Model choice | Full control (local, cloud, custom) | Microsoft/OpenAI models only | Proprietary + limited bring-your-own via API | | Autocomplete | Good-to-excellent (model-dependent) | Excellent out-of-box, consistent | Very good (tuned for Cursor fork) | | Inline editing | Good (context-aware chats) | Limited (only suggest/fix) | Superior (agent mode, multi-edit) | | Privacy | Max (local or private endpoint) | Requires data sent to GitHub cloud | Data sent to Cursor cloud | | IDE support | VS Code (great), JetBrains (good) | VS Code, JetBrains, many others | VS Code fork only | | Learning curve | Moderate (config setup) | Low | Low | | Multi-edit / agentic | Experimental, brittle | Not available (as of mid-2026) | Mature, but limited by model |

If you value freedom over ease-of-use, Continue wins. If you want a polished product that "just works" with minimal tinkering, Copilot or Cursor will save you hours of configuration.

When to use

Use Continue when:

You need to run AI assistance fully offline or behind a firewall.
You want to leverage a specific open-weight model (e.g., DeepSeek Coder, StarCoder, Qwen) for your domain or language.
You already have API access to multiple providers and want to mix or fallback between them (e.g., cheap local autocomplete + expensive cloud chat).
You are a team that wants to avoid per-seat licensing and can manage your own hosted model endpoint.
You enjoy tweaking context, system prompts, and slash commands to align with your workflow.

Avoid Continue when:

You want a "set it and forget it" experience with consistently high autocomplete across all languages.
You primarily work in JetBrains and need feature parity with VS Code.
You rely heavily on agentic multi-file refactoring that Cursor's Agent mode does best.
You are not willing to configure and occasionally debug model connections and performance.

Last verified: 2026-06-08 by kernel.