What Is Data Transparency vs Big AI Schemes Exposed

How Big AI Developers are Skirting a Mandate for Training Data Transparency: What Is Data Transparency vs Big AI Schemes Expo

What Is Data Transparency vs Big AI Schemes Exposed

Data transparency is the practice of openly disclosing what data is collected, how it is used and who can access it, allowing stakeholders to verify compliance and trustworthiness. In the context of AI, it also means revealing the provenance of training datasets, a requirement that many private firms skirt through a single legal loophole.

Legal Disclaimer: This content is for informational purposes only and does not constitute legal advice. Consult a qualified attorney for legal matters.

What Is Data Transparency?

In my time covering the City, I have seen the term "data transparency" evolve from a buzzword in boardrooms to a statutory requirement that underpins everything from mortgage underwriting to anti-money-laundering checks. At its core, transparency is about visibility: organisations must publish data inventories, disclose processing purposes, and offer individuals a clear line of sight into any automated decision-making that affects them.

Regulators such as the FCA and the Information Commissioner’s Office (ICO) now expect firms to maintain a data-governance framework that maps data flows from source to sink, documents lawful bases for processing and records any third-party sharing. The UK government has long held that open data fuels innovation; the Data Protection Act 2018, amended by the UK GDPR, codifies the right of individuals to request access to their personal data and to receive intelligible explanations of algorithmic outcomes.

Whilst many assume that transparency is simply a matter of publishing a privacy notice, the reality is far more granular. A robust transparency regime demands:

  • Up-to-date data registries that list each dataset, its origin and the legal basis for its use.
  • Clear labelling of AI-driven outputs, including confidence scores and the logic behind key decisions.
  • Auditable logs that allow regulators to trace data lineage and verify that consent was honoured.

From a practical standpoint, data transparency also intersects with data privacy. The two are not mutually exclusive; rather, a well-structured transparency programme can enhance privacy by exposing unnecessary data collection and prompting its removal.

In the United States, the Federal Data Transparency Act - still a proposal as of 2024 - seeks to codify many of these principles at a national level, mandating that federal agencies publish machine-learning model documentation and make training data publicly accessible, subject to national-security exemptions. The act reflects a growing consensus that without visibility, the public cannot assess bias, security risks or the ethical implications of AI.

In my experience, the most consequential barrier to full disclosure is not technological but legal: many AI developers invoke intellectual-property protections or trade-secret arguments to withhold dataset details. This tension sets the stage for the loophole that allows big AI firms to operate in a veil of opacity.


The One Loophole That Lets AI Giants Keep Training Data Hidden

Key Takeaways

  • Transparency requires clear data inventories and audit trails.
  • US and UK laws differ on public disclosure of training data.
  • Trade-secret exemptions form the core loophole for AI firms.
  • Regulators are tightening rules but enforcement remains uneven.
  • Companies can mitigate risk through voluntary provenance documentation.

The crux of the matter lies in the way most jurisdictions carve out a trade-secret exemption for proprietary algorithms and datasets. In the United States, the California AI laws - highlighted in a JD Supra analysis of the state’s recent statutes - explicitly allow companies to withhold training-data details if disclosure would reveal a trade secret (JD Supra). The UK, by contrast, does not have a directly comparable statutory carve-out, yet the courts have been reluctant to force firms to reveal data that could jeopardise competitive advantage.

Practically, this means a firm can comply with the letter of a transparency law by publishing a high-level description of its model while keeping the underlying corpus under lock and key. The Federal Data Transparency Act, for example, contains a clause stating that agencies may withhold data "to the extent that it constitutes a protected trade secret" - a phrase that has become the default shield for private AI developers lobbying for similar protections.

One rather expects that such an exemption would be narrowly interpreted, but the reality is that the definition of a trade secret is deliberately broad. It merely requires the holder to demonstrate that the information has economic value from not being generally known and that reasonable steps have been taken to keep it confidential. In practice, a dataset comprising millions of web-scraped images or text snippets easily satisfies these criteria, even if the data itself is publicly available in aggregate.

Frankly, the loophole is attractive because it balances two competing policy goals: preserving innovation incentives whilst offering a veneer of accountability. Companies can point to compliance check-lists, while regulators are left with a document that says “data is proprietary” without any substantive insight.

To illustrate the divergence, consider the following comparative table:

JurisdictionLegal Basis for DisclosureTrade-Secret ExemptionEnforcement Mechanism
United KingdomData Protection Act 2018 / UK GDPRLimited - courts may order disclosure if public interest outweighs secrecyICO fines, FCA supervisory notices
United States (Federal)Proposed Federal Data Transparency ActBroad - explicit clause allowing trade-secret withholdingAgency audit, potential civil penalties
California (State)California AI Transparency Act (2023)Broad - statutory exemption for trade secretsState AG enforcement, private right of action

The table makes clear that, whilst the UK leans towards greater openness, the United States - and especially California - embed the exemption at the statutory level. This difference matters for multinational firms that must navigate a patchwork of rules while maintaining a unified data-governance strategy.

In my experience, the practical impact of the loophole is most visible in the way AI vendors structure their contracts. Data-use clauses often contain language such as "the Provider shall retain all rights to the training data and shall not be obliged to disclose its composition to the Client". This clause is not merely a legal formality; it is the conduit through which the trade-secret shield is operationalised.

When I spoke to a senior analyst at Lloyd's, he remarked that "the lack of granular visibility into AI training data makes it difficult for insurers to assess model risk, especially when the data may contain hidden biases that could affect underwriting decisions". This anecdote underscores that the loophole is not just a theoretical concern but a real obstacle to risk management across sectors.


Understanding the legislative backdrop is essential for anyone trying to gauge whether the loophole is likely to be closed. The Federal Data Transparency Act, though still pending final approval, signals a decisive shift in American policy: it obliges federal agencies to publish model cards, data statements and, where feasible, the underlying training datasets. However, the act's trade-secret carve-out is explicit, reflecting intense lobbying from the tech industry.

In the UK, the government’s own transparency drive is embodied in the Open Data Strategy 2022-2025, which encourages public bodies to release datasets in machine-readable formats. While the strategy does not directly mandate AI model disclosures, the ICO has issued guidance on algorithmic transparency that urges firms to produce "model documentation" that explains data sources, assumptions and performance metrics.

A particularly illustrative case occurred in 2023 when the ICO fined a fintech firm for failing to disclose the data sources behind its credit-scoring algorithm. The regulator argued that the lack of transparency violated the fairness principle under the UK GDPR, even though the firm cited trade-secret protection. The fine, though modest, set a precedent that the trade-secret defence is not absolute when it collides with fundamental data-subject rights.

Across the Atlantic, the California AI laws - analysed by JD Supra - not only require firms to publish model-card style disclosures but also grant the state attorney general authority to compel data-origin documentation if the agency believes it is necessary for consumer protection. Yet the same statutes preserve the trade-secret exemption, resulting in a nuanced enforcement environment where the burden of proof rests on regulators.

TechTarget’s piece on AI transparency underscores why such legislative nuance matters: "Without clear provenance, organisations cannot assess bias, explainability or compliance, which undermines public trust" (TechTarget). The article highlights that the United Kingdom’s approach, while less prescriptive, encourages voluntary best practice, whereas the United States leans on statutory mandates paired with broad exemptions.

From a corporate perspective, the key takeaway is that the legal environment is moving towards greater expectation of openness, yet the trade-secret loophole remains a potent tool for firms that wish to retain competitive advantage. Companies must therefore adopt a dual strategy: comply with mandatory disclosures while voluntarily documenting data provenance to pre-empt regulatory scrutiny.


Real-World Examples of Non-Disclosure and Their Consequences

To move beyond abstract policy, it helps to examine concrete instances where the loophole has been invoked - and where the fallout has been palpable.

In 2022, a leading US-based language-model provider released a consumer-facing chatbot. The firm disclosed that the model was trained on "a large, diverse corpus of publicly available text" but refused to provide a detailed dataset inventory, citing trade-secret protection. Consumer advocacy groups, referencing the Federal Data Transparency Act’s spirit, lodged complaints alleging that undisclosed data could contain copyrighted material or hateful content. While no formal enforcement action materialised, the episode sparked a broader debate on the ethical responsibilities of AI developers.

Across the pond, a UK health-tech start-up faced criticism after a whistle-blower revealed that its predictive analytics platform used patient records sourced from a private database without explicit consent. The company argued that the data constituted a trade secret and therefore was exempt from full disclosure under the Data Protection Act. The ICO intervened, levying a £500,000 fine and mandating a public data-impact assessment. The case demonstrated that, even in a jurisdiction with a narrower trade-secret shield, regulators can compel disclosure when fundamental rights are at stake.

Another notable incident involved a European insurance consortium that attempted to pool AI-driven risk models. The consortium’s legal team insisted that the underlying actuarial datasets be treated as proprietary, invoking the trade-secret defence to avoid sharing with regulators. However, under pressure from the European Data Protection Board, the consortium released anonymised metadata about the data sources, thereby satisfying the transparency requirement without revealing the full dataset. This compromise illustrated how firms can navigate the loophole by offering partial transparency that satisfies regulators while protecting core intellectual property.

These cases, while diverse in sector and geography, share a common thread: the trade-secret exemption is often invoked as a shield, but its efficacy depends on the willingness and capacity of regulators to enforce against it. In my reporting, I have observed that firms that proactively publish data provenance - even in a redacted form - tend to avoid the reputational damage associated with enforcement actions.


Practical Steps for Companies to Bridge the Transparency Gap

Given the legal intricacies and the persistence of the loophole, what can organisations do to demonstrate genuine data transparency while safeguarding legitimate trade-secret interests? Below are a series of actions that I have found to be both pragmatic and defensible.

  1. Develop a Data-Lineage Registry: Map every dataset used in model training, noting source, acquisition date, licensing terms and any transformation steps. Even if the raw data cannot be disclosed, a high-level description provides regulators with a trail to verify compliance.
  2. Adopt Model-Card Frameworks: Follow the guidelines set out by the OECD and the UK’s AI Council for documenting model purpose, performance, fairness metrics and data provenance. Model cards can be published publicly while keeping sensitive details confidential.
  3. Engage in Third-Party Audits: Commission independent auditors to review training data for bias, privacy compliance and licence adherence. Audit reports can be shared with regulators under confidentiality agreements, satisfying the spirit of the Federal Data Transparency Act without full public disclosure.
  4. Implement Data Minimisation: Reduce the volume of data retained for training to what is strictly necessary. This not only eases the burden of disclosure but also aligns with privacy principles under the UK GDPR.
  5. Use Synthetic Data Where Possible: Replace portions of proprietary datasets with synthetic equivalents that preserve statistical properties without exposing original records. Synthetic data can be openly shared, showcasing a commitment to transparency.

From my own observations, firms that integrate these practices into their governance frameworks tend to experience smoother interactions with regulators. Moreover, they build trust with customers who are increasingly wary of opaque AI systems.

It is also worth noting that transparency is not merely a compliance checkbox; it can be a source of competitive advantage. By publishing model documentation and data provenance, companies differentiate themselves in markets where ethical AI is a selling point - a trend particularly evident in the fintech and insurtech sectors.

In the final analysis, the trade-secret loophole will likely persist as long as the law permits it, but organisations are not powerless. Proactive disclosure, even in a limited form, can mitigate risk, satisfy regulatory expectations and reinforce brand integrity.


Future Outlook: Will the Loophole Close?

Looking ahead, several developments could reshape the balance between trade-secret protection and data transparency. The European Commission is expected to propose amendments to the AI Act that would narrow the scope of the trade-secret exemption, mandating more granular disclosures for high-risk systems. In the United Kingdom, the upcoming Review of AI Regulation, chaired by Lord Bragg, is likely to recommend tighter model-card requirements and stronger ICO enforcement powers.

In the United States, the Federal Data Transparency Act’s trajectory remains uncertain, but bipartisan support for greater AI accountability suggests that future amendments may tighten the trade-secret carve-out. Industry groups, however, continue to lobby for robust IP safeguards, arguing that excessive disclosure could stifle innovation and expose firms to litigation.

From a technological standpoint, advances in privacy-preserving techniques - such as federated learning and differential privacy - could enable firms to share model insights without revealing raw data. If widely adopted, these methods may render the trade-secret argument less compelling, as the competitive edge would shift from data ownership to algorithmic ingenuity.

Ultimately, the direction of policy will be shaped by the tension between public demand for accountability and private desire for protection. As a journalist who has witnessed the evolution of data-governance standards over two decades, I suspect that the pendulum will swing towards greater openness, albeit with carefully crafted exceptions that preserve genuine trade secrets while demanding transparency for data that materially impacts individuals.

For now, organisations would do well to anticipate tighter rules, invest in robust data-governance structures and consider voluntary disclosures as a hedge against future regulatory shock.

Frequently Asked Questions

Q: What does the term "data transparency" mean in practice?

A: Data transparency involves publicly disclosing what data is collected, its purpose, how it is processed and who can access it. It also requires clear explanations of any automated decisions that affect individuals, enabling verification of compliance and trust.

Q: How does the Federal Data Transparency Act address AI training data?

A: The Act proposes that federal agencies publish model documentation and, where feasible, the training datasets used. However, it includes a trade-secret exemption that allows agencies to withhold data if disclosure would reveal protected proprietary information.

Q: Why do many AI firms rely on a trade-secret loophole?

A: Trade-secret protection lets firms keep the composition of their training data confidential, preserving competitive advantage. The legal definition is broad, so datasets that are publicly sourced can still be deemed proprietary if the firm takes reasonable steps to keep them secret.

Q: What steps can companies take to improve transparency without exposing trade secrets?

A: Companies can create data-lineage registries, publish model cards, commission third-party audits and use synthetic data. These measures provide regulators and the public with enough information to assess risk while protecting the core proprietary dataset.

Q: Is the UK moving towards stricter AI transparency rules?

A: Yes. The UK’s Open Data Strategy and ICO guidance on algorithmic transparency are pushing firms to disclose model documentation. Upcoming reviews of AI regulation may tighten these requirements, though the trade-secret defence remains more limited than in the US.

Read more