Perplexity AI Is Lying About Their User Agent

Robb Knight:

I put up a post about blocking AI bots after the block was in place, so assuming the user agents are sent, there’s no way Perplexity should be able to access my site. So I asked:

What is this post about https://rknight.me/blog/blocking-bots-with-nginx/

I got a perfect summary of the post including various details that they couldn’t have just guessed. Read the full response here. So what the fuck are they doing?

I checked a few sites and this is just Google Chrome running on Windows 10. So they’re using headless browsers to scrape content, ignoring robots.txt, and not sending their user agent string. I can’t even block their IP ranges because it appears these headless browsers are not on their IP ranges.

Terrific, succinct write-up documenting that Perplexity has clearly been reading and indexing web pages that it is forbidden, by site owner policy, from reading and indexing — all contrary to Perplexity’s own documentation and public statements.

Wednesday, 19 June 2024