Has AI changed the value of our personal data?

This guest post is written by Paolo Fornasini, co-founder of Keye

From Excel to SQL, technologies that help process and analyze information have led to ever greater innovation in the digital economy. Yet none of these technologies fundamentally changed the nature of the data they sought to organize. With the latest AI breakthroughs, things might finally be different.

A 2021 opinion piece by Tim O’Really references a widespread quote in tech that “data is the new oil”. Its original author aimed to illustrate how data needed to be refined to unlock its potential, but many took it to mean that data inherently held tremendous value, and individuals should be somehow tapping into this value. O’Reilly reasons that data's value is actually better represented by sand — ubiquitous and only truly valuable after industrial-scale processing. As Benedict Evans points out in his piece “There’s no such thing as data”, the kind of data we are talking about matters too. You can’t use restaurant order data to design a new missile guidance system. Evans is of course, right - all data is not the same. But that doesn’t mean there isn’t some value to data, whether in isolation or in aggregate. He argues “These technologies are not national strategic assets - anyone can have them, but what for?” in reference to past data-processing technologies such as SQL. 

To be sure, these are writers and thinkers I respect greatly. But are these arguments still valid? 

The Large Language Models (LLMs) that power modern AI applications feed on user data to “train” and become more effective. In other words, because the algorithms that power AI are not fully pre-programmed, but instead rely on learning from large datasets, the underlying data itself becomes more valuable. 

There has been a recent narrative shift at the national level too: governments view their regulatory decisions as a balance between too little restraint enabling AI companies to gobble up every piece of content online, and too much regulation hamstringing companies to the point where they are no longer competitive. But is that narrative shift due to the takeoff of the “AI revolution”, or because of the larger investments that governments have made in tech regulation, foreign competition and cybersecurity?

Some in the media have been quick to link the recent lawsuits surrounding the use of copyrighted materials in generative AI with the broader needs of LLMs to access vast amounts of data. However, the two need not necessarily conflict. For example, journalist Sheera Frenkel highlighted that both the US and China consider AI a strategic asset, where every regulatory measure potentially hampers their competitiveness.

“The US government sees itself in an arms race, at the moment, against China when it comes to AI. Both China and the United States have a lot of scientists that are invested in this. They have a lot of interest in being the world leaders in artificial intelligence. And so they know that every bit of regulation they put in place potentially holds back those US companies, as opposed to China, where there’s very little regulation on data and where there’s a ton of data online that the Chinese government can easily access and even give to Chinese AI companies if they want to speed ahead in what’s considered the AI arms race between the US and China.”

In reality, these are interconnected, yet still distinct issues. We shouldn't assume a black-and-white world in which nations can't devise meaningful consumer and creator-first solutions for data ownership, while sustaining competitive AI sectors. Copyrighted material represents a small fraction—likely less than 1%—of all data used by LLMs. So as to why the tradeoff between data access and international competitiveness has become more salient - the answer is likely all of the above. 

Whether the correct metaphor is oil, sand or something entirely different, you would be hard-pressed today to argue that it does not matter.  More voices—from consumers, scientists, artists, to individual creators—are asserting its importance. It's time we explored innovative solutions to empower people to control their data. 

Indeed, commentators are correct that thinking about data in terms of individual user value is pointless. But it is precisely because the value of data changes depending on its context, that we should not use existing profit models to determine its value in a new paradigm. $5 worth of data in an ad-funded model may seem like a drop in the bucket, but companies, governments and consumers are all willing to go to battle over it. Clearly, something about today is different. In a world where computer-generated content is cheaper than ever before, isn’t human-generated data everything?

Paolo Fornasini is co-founder of the premium content platform, Keye. After close to a decade in strategic partnership roles at Google and several startups, he obtained his MBA from the Wharton School and MA in International Studies from the Lauder Institute. You can follow more of Paolo's work on his Linkedin or Medium.