
Years after the launch of consumer AI, companies are racing to build governance structures, appointing chief AI officers, drafting policies and formalizing oversight processes. The goal is to ensure that AI adoption delivers measurable value while minimizing operational, legal and reputational risks. But in trying to create new frameworks, organizations are overlooking something crucial: the state of their data.
Most companies have been collecting transactional, operational and customer data for decades. In the age of AI, how this existing data is managed will determine whether AI systems deliver meaningful returns or amplify existing weaknesses.
AI governance databases
The public debate around the use of AI data has largely focused on model developers using the open internet, including social media, the books AND JOURNALISTto train generative AI systems. These practices have provoked backlash on privacy and copyright grounds, exposing unresolved questions about what constitutes fair use in the digital age.
Less attention has been paid to how the enterprises themselves are using the data. Harvard AI Index Report 2025 found that 88 percent of organizations are adopting AI in some capacity. These companies feed internal data into AI models to improve operations and generate insights. But much of this data was not collected with AI deployment in mind. From a governance perspective, enterprise data is often incomplete, inconsistently labeled, poorly documented, or insufficiently protective of personal and sensitive data.
This creates a structural gap. As companies invest in AI governance, many neglect the databases on which those systems depend. Based on our experience in advising organizations on responsible AI and data management programs, the bottom line is clear: AI governance starts with data governance.
Risks of poorly managed data
When AI systems are built on weak data foundations, risk is inevitable. Start with credibility. Artificial intelligence systems fed with incomplete or non-representative data will produce flawed results. Starbucks‘ placing one AI powered inventory tool illustrates the point: designed to automate stock counts and replenishments, the system was fed incorrect data. The result was lost inventory and product shortages, culminating in decreased sales. Instead of driving efficiency, the system introduced new costs.
Bias presents a second, more complex risk. AI models trained on datasets that favor certain groups will produce biased results. A 2025 Nature study of large language models trained on emergency department data found that the tools were more likely to recommend invasive medical treatments in black, LBGTQ+, and homeless patients than other groups, replicating the biases embedded in the training data. Similar concerns are emerging in employment, lending, insurance and law enforcement applications, where biased data can directly affect access to jobs, credit and public services. For businesses, adopting AI tools that produce biased results carries legal, financial and reputational consequences that are difficult and expensive to reverse.
Poor data governance also erodes transparency and accountability. Where training data, validation processes, and model performance are poorly documented, organizations accumulate “documentation debt.” This debt limits their ability to explain how decisions are made, with negative effects on regulatory compliance, incident investigations and audits.
The dangers extend even further. Reuse of data without a clear legal basis may violate data protection laws. Weak data provenance controls increase the likelihood of inadvertently using protected intellectual property. Biased or incomplete data can create downstream human rights impacts, particularly when automated systems affect employment, health care, financial access or housing.
These risks are not isolated compliance failures. Rather, they are the structural consequences of treating data governance as secondary to AI deployment.
Making your data ready for AI
Unlike model development, which is generally dependent on external vendors, data governance remains firmly within an organization’s control. Companies looking to extract value from AI should start there.
The first step is to build a comprehensive data inventory. Organizations need a clear record of what data they hold, where it comes from and the legal basis for its use. This includes identifying additional assessments – including privacy, legal or risk-related – needed before data can be reused for AI A well-executed inventory not only supports compliance, but enables faster and safer deployment of AI systems while reducing uncertainty around data quality and risk exposure.
Second, organizations should establish a data classification policy. Data assets should be categorized according to sensitivity, value and regulatory obligations. The aim is to protect the confidentiality, integrity and availability of data used in AI systems, while ensuring that they meet legal requirements and operational standards. Developing such a policy requires answering some deceptively simple but often overlooked questions: What data do we keep? How sensitive is it? What rules govern its use?
Third, roles and responsibilities must be clearly defined. Effective data governance depends on accountability. Data owners should be responsible for accuracy and classification, data custodians for safe storage and handling, and data users for appropriate application. Setting up these roles enables organizations to create safeguards when passing their data to AI systems.
Existing standards and legislation provide practical guidance. of THEY HAVE an act defines basic requirements for data quality and governance in AI systems. International standards like ISO 42001 create data-driven guidelines for AI applications, while ISO 27001 AND ISO 38500 set broad data governance requirements. Even in less regulated markets, these frameworks provide a practical starting point for building domestic governance maturity.
Data readiness is AI readiness
Business leaders should not only ask if their organizations are ready for AI, they should also consider if their data is. AI systems cannot compensate for weak databases. Without coherent, well-governed data, organizations risk investing in tools that amplify inefficiencies, introduce new liabilities and fail to deliver returns.
Policy makers, experts and enterprises are still debating where the responsibility for AI governance should lie. But in the matter of internal data quality, there is no ambiguity: accountability belongs to the organization. Businesses that treat data governance as a prerequisite are those best positioned to turn their AI investment into a competitive advantage and defend their decisions when scrutiny arrives.
Amelia Williams is a Senior Research Impact Officer at Trilateral Research with expertise in science communication at the intersection of emerging technologies, environmental issues, ethics and policy. At Trilateral, she supports the development and implementation of research projects alongside engagement in policy, media and industry.





