Protege Seed Funding, Launches AI Training Data Platform

by

Protege has raised $10 million in a seed funding round and unveiled its AI training data platform, aimed at addressing one of AI development’s key challenges—access to the right training data. The funding round was led by CRV, with participation from investors including SV Angel, Liquid 2 Ventures, Bloomberg Beta, Flex Capital, Adam D’Angelo, and Travis May.

Bobby Samuels, CEO and Co-Founder of Protege, emphasized the importance of solving the AI training data bottleneck. “The lack of availability of training data is the biggest bottleneck in AI today,” said Samuels. “By creating a platform for data holders to provide controlled access to their assets, we enable faster, safer development of critical models.”

The current process of sharing AI training data often involves long negotiations, dealing with intellectual property, and governance issues that can take months or even years to resolve. Protege’s platform seeks to simplify this by offering data holders the tools to share their data securely while allowing AI developers to access it easily and confidently, knowing the data is responsibly sourced.

Founded in early 2024 by Samuels and Travis May, co-founder of LiveRamp and Datavant, Protege focuses on making proprietary data available for AI model training. Much of the data on the platform has never been accessible outside the organizations that own it, but it plays a critical role in AI innovation across various industries.

May, who also serves as CEO of Shaper Capital, spoke about the market need for this platform. “Five years ago, there was no market for training data. Now, every major LLM and AI application needs it, but the process is full of friction. Protege is solving that by enabling seamless, controlled exchanges.”

Saar Gur, General Partner at CRV, echoed this sentiment, stating that the opportunity for training data is one of the most significant he’s seen in his career. “We’re excited to partner with Protege as they tackle this massive challenge,” Gur added.

Protege’s platform has the potential to transform AI development by breaking down barriers to data access, making it easier for companies to collaborate and innovate. With the new funding, Protege plans to expand its platform and continue growing its network of data holders and AI developers. As AI applications increase across industries, Protege is positioned to become a key player in enabling the next wave of AI-driven innovation.

How does Protege work

Protege operates by creating a secure, compliant platform that facilitates the exchange of AI training data between data holders and AI developers. Here’s a breakdown of how it works:

  1. Data Holders Share Data Securely: Protege equips organizations that own valuable, proprietary data with the tools to share their data in a controlled and secure manner. Many companies have datasets that are essential for AI model training but are hesitant to share them due to concerns about intellectual property, privacy, or governance. Protege helps these companies manage these concerns by offering robust security and governance features, ensuring that data is shared only with authorized parties and under agreed-upon conditions.
  2. Controlled Access to Data: Protege allows data holders to define strict access rules and permissions, ensuring that their data is used in a way that aligns with their legal, privacy, and business policies. This controlled environment ensures that AI developers and researchers can access the data they need without compromising the integrity or privacy of the data.
  3. Data Discoverability: AI developers and organizations building models can use Protege’s platform to discover relevant, high-quality training data that has historically been difficult to find or access. Protege indexes and organizes datasets, allowing model builders to search for specific data tailored to their needs, improving the efficiency and speed of AI development.
  4. Frictionless Data Exchange: Protege simplifies the typically lengthy and complex process of data sharing by offering a standardized platform for negotiation and exchange. This minimizes the back-and-forth legal and technical discussions that often delay access to training data, speeding up the time it takes for developers to get started on AI projects.
  5. Compliance and Governance: Protege ensures that all data exchanges comply with relevant laws and regulations, such as GDPR, HIPAA, and other privacy frameworks, making sure that both data holders and model builders can trust the platform. By providing built-in compliance features, Protege reduces the risk of data misuse and regulatory breaches.
  6. Fostering AI Innovation: By connecting data holders with AI developers, Protege helps unlock the potential of proprietary datasets that were previously siloed, allowing the creation of better, more robust AI models. This drives faster innovation across industries by solving the training data bottleneck.

Protege acts as an intermediary that makes it easier for data holders to safely share their datasets while providing AI developers with the critical data they need to build effective models, all while ensuring security, compliance, and transparency throughout the process.

Related News