The advent of artificial intelligence (AI) has brought about a new frontier in data usage and sharing. As AI models require vast amounts of data for training, organizations and creators are grappling with the delicate balance between maintaining the openness of the internet and protecting their data. In response to this challenge, Creative Commons, the organization behind the licensing movement that enables creators to share their works while retaining copyright, is gearing up for the AI era with the launch of its new project, CC signals.
CC signals aims to create a harmonious balance between the open nature of the internet and the growing demand for data to fuel AI. The project seeks to provide a legal and technical solution that offers a framework for dataset sharing, bridging the gap between those who control the data and those who use it to train AI models.
The data extraction currently underway could potentially erode the openness of the internet, leading to entities restricting access to their sites or implementing paywalls instead of sharing their data. In contrast, CC signals proposes a set of tools that offer a range of legal enforceability, all of which carry an ethical weight similar to the CC licenses that cover billions of openly licensed creative works online.
The demand for such a tool is on the rise as companies face the task of adjusting their policies and terms of service to either limit AI training on their data or clarify the extent to which they will use users' data for AI-related purposes. Examples of this shift include X's initial decision to allow third parties to train their models on its public data, followed by a reversal of that decision. Reddit is using its robots.txt file to restrict bots from scraping its data for AI training, while Cloudflare is exploring solutions that would charge AI bots for scraping and tools to confuse them. Open source developers have also built tools to slow down and waste the resources of AI crawlers that do not respect their "no crawl" directives.
CC signals is designed to sustain the commons in the age of AI, according to Anna Tumadóttir, Creative Commons CEO. "Just as the CC licenses helped build the open web, we believe CC signals will help shape an open AI ecosystem grounded in reciprocity," she said in an announcement.
The project is still in its early stages, with initial designs published on the CC website and GitHub page. Creative Commons is actively seeking public feedback as it prepares for an alpha launch in November 2025. The organization will also host a series of town halls to gather feedback and answer questions. As the CC signals project takes shape, it has the potential to revolutionize data sharing in the AI era, fostering an open and ethical ecosystem that benefits both data providers and AI developers.