A recent communication from GitHub signals a notable shift in how user data may be leveraged to enhance its AI tooling, specifically GitHub Copilot. From 24 April 2026, interactions with Copilot, covering prompts, generated code, snippets, and contextual metadata, may be used to train and refine AI models unless users explicitly opt out.
At first glance, this reads as a standard product improvement notice. In reality, it introduces a set of material risks that warrant closer scrutiny, particularly for developers working within commercial, regulated, or security-sensitive environments.
TL:DR – For organisations, this is not merely a developer preference setting. It is a governance issue.
This update is not unusual. It is, however, consequential.
AI-assisted development is rapidly becoming standard practice. But convenience should not quietly erode control. The default settings of tools rarely align with the risk tolerance of serious organisations.
A small configuration choice left unchecked can have unintended consequences
Act accordingly.
Contents
- The Core Issue: Your Code as Training Data
- Opt-Out by Default: A Subtlety that is significant
- “Industry Practice” – But Not Without Debate
- Practical Risks for Development Teams
- What Should Be Done Now
- Alternatives and Mitigation Strategies
- Opting out
- Allow GitHub to use my data for AI model training
- A Practical Alternative: Self-Hosted Git Infrastructure
The Core Issue: Your Code as Training Data
The central concern is straightforward. Code, prompts, and contextual interactions—potentially including proprietary logic, credentials, or sensitive architectural patterns—may be ingested into training pipelines.
Even if anonymised or processed in aggregate, the implications are not trivial.
Short snippets can still be revealing.
Context can be reconstructive.
Patterns, once learned, can resurface.
This creates a non-zero risk of:
- Intellectual property leakage
- Exposure of confidential business logic
- Unintentional data retention beyond intended scope
For organisations, this is not merely a developer preference setting. It is a governance issue.
Opt-Out by Default: A Subtlety that is significant
The policy operates on an opt-out basis. That matters.
Opt-in models place control firmly with the user. Opt-out models assume consent unless action is taken. In practice, this leads to:
- Lower awareness among teams
- Inconsistent configuration across developers
- Silent policy drift over time
Even with assurances that prior opt-out preferences are preserved, the burden remains on users and organisations to verify, monitor, and enforce compliance.
“Industry Practice” – But Not Without Debate
The email positions this approach as aligned with broader industry norms. That is partially true. Many AI providers, including Microsoft, have adopted similar strategies to improve model performance through real-world usage data.
However, “common” does not mean “unproblematic”.
There is an ongoing tension between:
- Model improvement through data aggregation
- User expectations of privacy and control
For enterprise environments, especially in legal, financial, healthcare or other regulated sectors, this tension is unresolved and should be considered unacceptable without strict safeguards.
Practical Risks for Development Teams
In day-to-day development workflows, the risks are subtle but pervasive:
- Accidental Exposure
Developers frequently paste code containing API keys, internal endpoints, or proprietary algorithms into Copilot prompts. - Contextual Leakage
Even without explicit secrets, surrounding context—file names, comments, structure—can reveal sensitive information. - Compliance Breach
Use of AI tools that process data externally may violate contractual or regulatory obligations (e.g. client confidentiality clauses, GDPR considerations). - Loss of Control
Once data enters a training pipeline, visibility and control diminish significantly.
What Should Be Done Now
At a minimum, teams should not treat this as a passive update.
Concrete steps include:
- Audit current Copilot settings across all developer accounts
- Enforce opt-out where appropriate via organisational policy
- Establish clear usage guidelines for AI-assisted coding
- Train developers on what should never be shared with AI tools
- Review contractual obligations regarding data handling and third-party processing
This is basic operational hygiene. It should not be optional.
Alternatives and Mitigation Strategies
There are viable approaches for reducing exposure while retaining productivity gains.
- Enterprise Controls
GitHub Enterprise and similar offerings provide tighter policy management. These should be evaluated properly, not assumed sufficient by default. - Self-Hosted or Private Models
Running AI models within a controlled environment—whether on-premise or within a private cloud—keeps data within defined boundaries. - Tool Segmentation
Use AI tools for non-sensitive code only. Keep critical or proprietary work outside these systems. - Prompt Hygiene
Adopt strict conventions: no secrets, no credentials, no identifiable client data. Ever. - Alternative Tooling
Evaluate competing or self hosted solutions with clearer data boundaries or explicit no-training guarantees.
Opting out
To opt out or adjust your settings:
- Go to GitHub Account Settings
- Select Copilot
- Choose whether to allow your data to be used for AI model training
Allow GitHub to use my data for AI model training
Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement
A Practical Alternative: Self-Hosted Git Infrastructure
For organisations unwilling to accept these trade-offs, moving to a self-hosted platform such as Gitea is a sensible and credible option. It provides full control over source code, access policies, and data residency without external data processing by default. The benefits are tangible: complete data sovereignty, simplified compliance with UK and EU regulatory frameworks, reduced third-party exposure, and the flexibility to integrate AI tooling on your own terms, whether locally hosted or selectively enabled.
For teams operating in sensitive sectors, this is not over-engineering. It is proportionate risk management.
If this shift raises concerns within your organisation, now is the time to act. Assess your exposure, define your policy, and consider whether your current tooling aligns with it. If you need support designing or implementing a self-hosted Git and AI strategy, get in touch.
See also: Running Gitea with Let’s Encrypt on macOS via Homebrew