Discussion Legal Copyright

What are the copyright implications of AI using our content? Getting conflicting legal advice

"Content_Rights_Confused" · 2025-12-22T00:00:00+00:00

"Community discussion on copyright implications of AI content usage. Real experiences and legal considerations from publishers and content creators dealing with AI training and citation."

Content_Rights_Confused · Publishing Director

· Dec 22, 2025 · 92 upvotes · 11 comments

Content_Rights_Confused

Publishing Director · December 22, 2025

We’re a B2B publisher. Our content is being used by AI systems, and I’m getting conflicting advice.

Lawyer A says: “This is copyright infringement. Block all AI crawlers. Prepare for litigation.”

Lawyer B says: “This is fair use. You can’t stop it. Focus on maximizing visibility benefits.”

What I’m observing:

Our articles appear in ChatGPT answers
Perplexity regularly cites our research
We’re not getting compensated
But we ARE getting referral traffic

My questions:

What’s the actual legal status right now?
Should we block AI crawlers or embrace them?
Are licensing deals realistic for mid-size publishers?
What are other publishers actually doing?

I need a practical position, not just legal theory.

11 comments

11 Comments

Publishing_Industry_Watch Expert Media Industry Analyst · December 22, 2025

Let me give you the current state of play:

Active litigation (as of December 2025):

NYT vs OpenAI (ongoing, major case)
Various author groups vs AI companies
Music industry actions
Image artist lawsuits

No final precedent yet. Courts haven’t definitively ruled on whether AI training constitutes fair use.

What AI companies argue:

Training is transformative use
They’re creating new works, not reproducing
Similar to how humans learn from content

What publishers argue:

Training is reproduction at scale
Commercial benefit without compensation
Undermines content business models

The practical reality:

Publisher Type	Typical Strategy
Major (NYT, WSJ)	Litigation + licensing negotiations
Large (major outlets)	Licensing negotiations, some blocking
Mid-size	Mostly allowing, hoping for visibility
Small	Allowing, focusing on traffic benefits

Why mid-size publishers mostly allow:

No leverage for licensing deals
Litigation is expensive
Visibility provides real business value
Blocking costs more than it protects

Licensing_Reality VP Business Development · December 22, 2025

Replying to Publishing_Industry_Watch

On licensing deals specifically:

Who has deals:

Major news publishers (NYT approached, others signed)
Large content archives
Academic publishers
Major image/video libraries

Deal sizes (reported):

News Corp: $250M+ over 5 years
Various others: $5-50M range
Small publishers: No deals available

Why mid-size can’t get deals:

AI companies don’t need your specific content
Transaction cost of small deals isn’t worth it
They’d rather fight in court than set precedent
Your content is already in training data

The uncomfortable truth: Unless you’re NYT-scale, licensing isn’t realistic.

What you CAN do:

Maximize visibility value now
Document usage for potential future claims
Join publisher coalition groups
Monitor legal developments

The cost-benefit: Blocking = lose visibility, protect nothing meaningful Allowing = gain visibility, uncertain future rights

Most mid-size publishers choose visibility.

Legal_Practical_View Media Lawyer · December 21, 2025

Note: Not legal advice, general information only.

Why your lawyers disagree:

Lawyer A (block/litigate):

Focused on pure legal rights
Correct that unauthorized use may be infringement
Protecting potential future claims
Conservative risk approach

Lawyer B (embrace/allow):

Focused on business reality
Correct that outcome is uncertain
Maximizing current value
Pragmatic risk approach

Both are right, from their perspectives.

The questions to ask:

Can you afford to litigate?
- Individual lawsuits: $500K-2M+
- Class actions: Join existing groups
What are you actually protecting?
- Content already in training sets: Can’t be removed
- Future content: Can be blocked
- Citation/visibility: Business value
What’s your business model?
- Subscription/paywall: Maybe protect
- Ad-supported: Visibility matters more
- Lead generation: Visibility matters most

My observation: Most B2B publishers choose visibility because their business model benefits from awareness more than it loses to AI usage.

Publisher_Decision CEO at Industry Publication · December 21, 2025

Here’s what we decided and why:

Our business: B2B industry publication, similar to yours. Revenue: Advertising + events + sponsored content

Our decision: Allow all AI crawlers. Maximize visibility.

Why:

Our revenue comes from audience, not content sales AI visibility = more audience = more revenue
Blocking wouldn’t help Content already in training sets. Blocking only stops future value.
AI traffic is valuable We see 5% of traffic from AI referrals. Those users convert well.
No realistic licensing option We approached OpenAI. No interest in our scale.
Legal costs exceed benefits Litigation would cost more than potential recovery.

What we did do:

Track AI citations with Am I Cited
Document usage patterns
Join publisher coalition (in case of class action)
Optimize for AI visibility

The result: AI visibility up 200%. Referral traffic growing. Brand awareness improving.

Would we accept a licensing deal? Sure. But we’re not waiting for one.

Training_vs_Citation AI Researcher · December 21, 2025

Important distinction many miss:

Training data use vs. Real-time citation

Aspect	Training Data	Real-time Citation
When it happens	Model building	Each query
What’s used	Full content	Snippets/facts
Can you block?	Future only	Yes (robots.txt)
Legal status	Heavily disputed	Less controversial
Business impact	Past content included	Affects visibility now

Different AI systems, different models:

ChatGPT (base):

Your content in training = used for responses
No real-time retrieval
Blocking now doesn’t affect training already done

ChatGPT (Search):

Real-time retrieval from Bing
More like traditional search/linking
Blocking affects this

Perplexity:

Real-time retrieval and citation
Links to sources
Most similar to traditional search

The nuance: Blocking ChatGPT’s training crawlers (GPTBot) = excludes from future training, doesn’t affect current model Blocking Perplexity = loses real-time citation benefits

Many publishers: Block training crawlers, allow citation crawlers. Balances concerns.

Selective_Approach Expert · December 20, 2025

Here’s a nuanced robots.txt approach:

The selective strategy:

# Block training crawlers
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# Allow citation/search crawlers
User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

What this does:

Blocks inclusion in future training data
Allows real-time search and citation
Maintains visibility benefits
Partially protects rights

Who uses this approach: Some major publishers trying to balance.

The limitation: Past training data still exists. This only affects future.

For your lawyers: This might satisfy both:

“We’re protecting our content from training” (Lawyer A)
“We’re maintaining visibility benefits” (Lawyer B)

It’s a middle ground that many find acceptable.

Future_Outlook Industry Analyst · December 20, 2025

What’s likely to happen (my prediction):

Short term (2026):

More litigation, no clear resolution
More licensing deals for major players
Mid-size publishers continue current strategies

Medium term (2027-2028):

Court decisions start establishing precedent
Possible legislative action (EU already moving)
Industry-wide licensing frameworks may emerge

Long term (2028+):

Clearer legal frameworks
Possibly mandatory licensing or opt-out systems
New revenue models for publishers

What this means for you:

Don’t bet everything on future compensation
Current visibility value is real and now
Document usage for potential future claims
Stay flexible as landscape evolves

The parallel: Like early music/video streaming - started controversial, eventually established licensing. AI content may follow similar path.

But that took years. Don’t put business on hold waiting for resolution.

Content_Rights_Confused OP Publishing Director · December 20, 2025

This helped me form a position. Our strategy:

Decision: Allow with documentation

What we’re doing:

Allow most AI crawlers for visibility benefits
Selectively block training crawlers where practical (GPTBot, CCBot)
Allow citation crawlers (PerplexityBot, ChatGPT-User)
Document everything for potential future claims
Join publisher coalitions for collective action leverage

How I’m framing for leadership:

“The legal situation is genuinely uncertain. Neither blocking nor allowing has clear legal protection. Given our business model relies on audience reach, we recommend maintaining AI visibility while:

Documenting AI usage of our content
Participating in industry collective action
Blocking training-specific crawlers where possible
Monitoring legal developments for strategic adjustment”

For my lawyers: This gives Lawyer A the blocking/documentation they want while giving Lawyer B the visibility/pragmatism they recommend.

Key insight: This isn’t a copyright strategy - it’s a business strategy that acknowledges copyright uncertainty. We’re optimizing for what we can control (visibility) while preserving options for what we can’t (legal outcomes).

Thanks everyone for the practical perspectives.

Have a Question About This Topic?

Get personalized help from our team. We'll respond within 24 hours.

Frequently Asked Questions

Can AI systems legally use my content for training?

This is actively being litigated. AI companies argue fair use; publishers argue infringement. Major lawsuits (NYT vs OpenAI, etc.) are ongoing. Current legal status is uncertain, which is why some publishers are negotiating licensing deals rather than litigating.

Should I block AI crawlers to protect my copyright?

Blocking prevents future crawling but doesn’t remove content already in training sets. It also eliminates AI visibility benefits. Most businesses choose visibility over blocking unless they have specific licensing negotiations or content sales models to protect.

Are licensing deals with AI companies worth it?

For major publishers with leverage, yes - deals range from millions to hundreds of millions. For most businesses, licensing isn’t an option because AI companies aren’t offering deals. Focus on visibility benefits instead of waiting for compensation.

What's the difference between training data use and citation?

Training uses content to build the model (controversial legally). Citation references content in real-time to answer queries (more like traditional linking). Different AI systems do different things: ChatGPT base uses training data; Perplexity cites in real-time.

Monitor Your Content in AI Answers

Track how your content is being used and cited across ChatGPT, Perplexity, and other AI platforms.

Start Monitoring Learn More

Learn more

Anyone else worried about content rights with AI? The legal landscape is getting wild

Community discussion on content rights in AI, covering copyright concerns, licensing frameworks, fair use debates, and strategies for protecting creator content...

Jan 9, 2026 8 min read

Discussion Content Rights +2

Are publisher licensing deals with AI companies affecting who gets cited? What's happening?

Community discussion on how publisher licensing deals with AI companies like OpenAI affect citation patterns and content visibility. Perspectives from publisher...

Jan 9, 2026 7 min read

Discussion Publishing +2

Which AI crawlers should I allow in robots.txt? GPTBot, PerplexityBot, etc.

Community discussion on which AI crawlers to allow or block. Real decisions from webmasters on GPTBot, PerplexityBot, and other AI crawler access for visibility...

Dec 30, 2025 7 min read

Discussion Technical +1