The legal implications would be different vs scraping publicly available content...

AnthonyMouse · 2026-05-13T05:45:04 1778651104

Is there a case that actually says this? Why would whether something is fair use depend on that? For that matter, how would they even show that a given AI model was trained on something from a recursive crawler rather than the same articles added to the training data after being downloaded by hand?

Gigachad · 2026-05-13T07:50:28 1778658628

There was a similar case where a web scraper was bypassing prevention mechanisms on linked in

https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

fragmede · 2026-05-13T16:43:47 1778690627

That case is why Twitter, and anyone else with lawyers paying attention went and put content behind a login wall.

AnthonyMouse · 2026-05-14T07:19:15 1778743155

Twitter griefs everyone with a login wall because they want bulk downloaders to pay for API access instead and the login wall is an attempt to rate limit non-API bulk requests.

That isn't relevant to ordinary media outlets because a) they don't have enough content volume for rate limiting to be effective since it's possible to get everything they publish even at a slow rate limit, and b) getting AI scrapers to subscribe to their bulk download API instead is not the objective in their case.

AnthonyMouse · 2026-05-13T09:20:19 1778664019

That case seems to imply the opposite?