Daring Fireball: Reddit Files Lawsuit Accusing ‘Data Scraper’ Companies of Stealing Its Information

Reddit Files Lawsuit Accusing ‘Data Scraper’ Companies of Stealing Its Information

Mike Isaac, reporting for The New York Times:

Eight years ago, SerpApi, a start-up in Austin, Texas, dived headlong into the byzantine world of using robots to “scrape” Google’s search algorithms, so it could collect information to help customers appear higher in search results.

Then OpenAI’s ChatGPT came along, kicking off an artificial intelligence revolution. As more tech companies began building A.I. chatbots to keep up, they needed large amounts of data to train their A.I. models — data that SerpApi had already gathered.

Practically overnight, a class of companies like SerpApi — known as “data scrapers” — found a new business selling data scraped from Google to companies looking to train their A.I. chatbots.

On Wednesday, the internet message board Reddit decided to fight the data scrapers. It filed a lawsuit in the U.S. District Court for the Southern District of New York claiming that four companies had illegally stolen its data by scraping Google search results in which Reddit content appeared.

I’d never heard of — or at least never noticed — SerpApi until a few weeks ago, when a good friend asked me if I’d ever looked into them. The entire premise of their business is crazy. SerpApi prints the crime right on the tin, describing their service as a “Google Search API” and “Scrape Google and other search engines from our fast, easy, and complete API.” What makes this so crazy is that Google doesn’t offer a search API. SerpApi is offering the Google search API that Google itself doesn’t offer, and charging companies money for it. Everyone, upon hearing the premise and nature of SerpApi, asks the same question: How is this legal? The answer is, it probably isn’t. But right on SerpApi’s home page they claim to offer customers a “U.S. Legal Shield”:

The crawling and parsing of public data is protected by the First Amendment of the United States Constitution. We value freedom of speech tremendously. We assume scraping and parsing liabilities for both domestic and foreign companies unless your usage is otherwise illegal. (Including but are not limited to: acts of cyber criminality, terrorism, pedopornography, denial of service attacks, and war crimes.)

My only surprise here is that it’s Reddit taking SerpApi (along with two similar companies, one from Lithuania and the other from Russia — the former Soviet states respect intellectual property about as much as China does) to court, not Google. Why Google hasn’t sued them yet, I don’t understand. Anyway, back to Isaac’s report for the Times:

Perplexity was one of those buyers, according to Reddit’s lawsuit. Perplexity had scraped Reddit data in the past without payment but agreed to stop after Reddit sent it a cease-and-desist order. Even so, citations to Reddit data in Perplexity search results jumped “fortyfold,” the lawsuit said. Reddit has spent tens of millions of dollars on anti-scraping systems over several years.

“Perplexity’s business model is effectively to take Reddit’s content from Google search results,” then feed it into an A.I. model and “call it a new product,” the lawsuit said.

Reddit said it had set a trap for Perplexity by creating a “test post” on its site that could “only be crawled by Google’s search engine and was not otherwise accessible anywhere on the internet.” Within hours, Perplexity search results had surfaced the content of that test post, the lawsuit said.

Google, which is not a plaintiff in Reddit’s lawsuit, has tried and failed to stop SerpApi and other data scrapers, according to the lawsuit and previous reporting from The Information.

The people leading Perplexity aren’t just shifty — they’re stupid. That whole company just reeks of being a scam.

★ Thursday, 23 October 2025