Turnkey AI coding assistant that runs entirely on-premises — tab completions, chat, and inline diffs powered by open-source models with zero internet dependency.
Cloud-based AI coding tools are off-limits in secure environments. Your developers watch the productivity revolution from the sidelines.
Developers in air-gapped environments miss out on 30–40% productivity gains that cloud-connected teams enjoy from AI coding tools.
78% of developers in secure environments say they want AI coding assistance but are blocked by data exfiltration policies.
Zero AI coding assistants exist today that run fully on-premises with no internet dependency. Until M8.
A self-contained AI coding platform that deploys behind your firewall. Open-source models, zero external dependencies, full organizational control.
Nine core capabilities that bring the full AI coding experience to your air-gapped network.
Context-aware code suggestions as you type, powered by a purpose-built fill-in-the-middle model.
Ask questions about your codebase, generate tests, refactor code — all inside your IDE.
Review AI-suggested changes inline with accept/reject controls. No context switching.
Monitor usage, manage seats, configure models, and view analytics across your organization.
Single sign-on via your existing directory service. No separate accounts to manage.
All communication between IDE plugins and the M8 server encrypted with TLS 1.3.
Per-user and org-level rate limits prevent abuse and ensure fair resource allocation.
Prometheus-compatible metrics endpoint. Grafana dashboards included out of the box.
Docker bundle with pre-baked model weights. No internet required at any point during deployment.
Download the M8 Docker bundle with pre-baked model weights on a connected machine. Transfer to a portable drive.
Carry the bundle across the air gap. Load Docker images on your GPU server inside the secure network.
Run one command. M8 configures vLLM, starts the API server, and connects to your LDAP directory. 30 minutes to working AI coding.
Both models are Apache 2.0 licensed with no usage restrictions. Developed by organizations in NATO-allied countries.
Purpose-built for inline code completion. Trained on permissively-licensed code. Optimized for sub-200ms low-latency suggestions.
State-of-the-art open-weight model for code understanding, generation, refactoring, and natural language interaction with large context windows.
M8 scales from a single RTX 4090 supporting a small team to NVIDIA B300 clusters serving hundreds of developers.
| GPU | VRAM | User Capacity | Price Point |
|---|---|---|---|
| RTX 4090 | 24 GB | 5–15 users | ~$2K GPU |
| L4 | 24 GB | 5–15 users | ~$0.85/hr |
| 2–4x L4 | 48–96 GB | 30–80 users | ~$3–5/hr |
| L40S | 48 GB | 30–60 users | ~$1/hr |
| A100 40GB | 40 GB | 30–50 users | ~$2/hr |
| A100 80GBRecommended | 80 GB | 50–100 users | ~$3/hr |
| H100 | 80 GB | 100–200 users | ~$4/hr |
| B300 | 192 GB | 300–500 users | Enterprise |
Every plan includes the air-gap deployment bundle, IDE plugins, and a 2-week free trial.
For small teams getting started with AI coding assistance.
Full AI coding suite with chat and team management.
Maximum performance and compliance for large organizations.
See how M8 compares to cloud-based alternatives on the dimensions that matter most to secure environments.
M8 is the only turnkey AI coding assistant that deploys behind an air gap with open-source models.
| Feature | M8 | GitHub Copilot | Cursor | Tabnine Enterprise | Continue.dev |
|---|---|---|---|---|---|
| Air-gapped deployment | Turnkey | Cloud only | Cloud only | Proprietary | DIY |
| Open-source models | Apache 2.0 | Proprietary | Proprietary | Proprietary | Yes |
| On-premises | Full stack | No | No | Limited | Model only |
| Tab completion | Yes | Yes | Yes | Yes | Yes |
| Chat assistant | Yes | Yes | Yes | No | Yes |
| Inline diffs | Yes | Yes | Yes | No | Limited |
| Admin dashboard | Full | No | No | Yes | No |
| LDAP/AD SSO | Yes | No | No | Yes | No |
| GPU profiles | 8 pre-configured | N/A | N/A | N/A | No |
| Monitoring (Grafana) | Pre-built | N/A | N/A | Yes | No |
| Cost | $20–60/seat/mo | $19–39/seat/mo | $20–40/seat/mo | $39+/seat/mo | Free DIY |
M8 does not just promise data protection — it makes data exfiltration architecturally impossible. No internet connection means no data can leave your network, period.
Everything you need to know about M8.
All models run locally via vLLM on your GPU server. Docker images include pre-baked model weights. Zero external API calls are made at any point — during installation, operation, or updates.
M8 uses purpose-built open-source models — one optimized for fast tab completion (fill-in-the-middle), another for chat and code generation. Both are Apache 2.0 licensed, sourced from allied-nation research institutions.
30 minutes from bundle transfer to a working AI coding assistant. One command installs and configures everything — vLLM, the API server, LDAP integration, and IDE plugin distribution.
M8 supports 8 GPU profiles ranging from RTX 4090 (5–15 users) to NVIDIA B300 (300–500 users). See the GPU compatibility table above for full details.
Yes. M8 achieves zero data exfiltration by design — no network calls leave your perimeter. It supports deployment requirements for CMMC 2.0, ITAR, HIPAA, and PCI-DSS. TLS 1.3 encryption and LDAP/AD SSO are built in.
GitHub Copilot requires internet connectivity and sends code to Microsoft servers for processing. M8 runs entirely on-premises with open-source models — similar tab completion and chat features, but zero data ever leaves your network.
M8 uses Apache 2.0 licensed models with no usage restrictions. The M8 software itself is BSL 1.1, which automatically converts to Apache 2.0 in 2030. Full source code is available for customer audit.
Yes. We offer a 2-week free trial with the full air-gap bundle. The trial includes defined success criteria, an evaluation framework, and direct engineering support during the evaluation period.