Discussion Legal Copyright

What are the copyright implications of AI using our content? Getting conflicting legal advice

CO
Content_Rights_Confused · Publishing Director
· · 92 upvotes · 11 comments
CR
Content_Rights_Confused
Publishing Director · December 22, 2025

We’re a B2B publisher. Our content is being used by AI systems, and I’m getting conflicting advice.

Lawyer A says: “This is copyright infringement. Block all AI crawlers. Prepare for litigation.”

Lawyer B says: “This is fair use. You can’t stop it. Focus on maximizing visibility benefits.”

What I’m observing:

  • Our articles appear in ChatGPT answers
  • Perplexity regularly cites our research
  • We’re not getting compensated
  • But we ARE getting referral traffic

My questions:

  1. What’s the actual legal status right now?
  2. Should we block AI crawlers or embrace them?
  3. Are licensing deals realistic for mid-size publishers?
  4. What are other publishers actually doing?

I need a practical position, not just legal theory.

11 comments

11 Comments

PI
Publishing_Industry_Watch Expert Media Industry Analyst · December 22, 2025

Let me give you the current state of play:

Active litigation (as of December 2025):

  • NYT vs OpenAI (ongoing, major case)
  • Various author groups vs AI companies
  • Music industry actions
  • Image artist lawsuits

No final precedent yet. Courts haven’t definitively ruled on whether AI training constitutes fair use.

What AI companies argue:

  • Training is transformative use
  • They’re creating new works, not reproducing
  • Similar to how humans learn from content

What publishers argue:

  • Training is reproduction at scale
  • Commercial benefit without compensation
  • Undermines content business models

The practical reality:

Publisher TypeTypical Strategy
Major (NYT, WSJ)Litigation + licensing negotiations
Large (major outlets)Licensing negotiations, some blocking
Mid-sizeMostly allowing, hoping for visibility
SmallAllowing, focusing on traffic benefits

Why mid-size publishers mostly allow:

  • No leverage for licensing deals
  • Litigation is expensive
  • Visibility provides real business value
  • Blocking costs more than it protects
LR
Licensing_Reality VP Business Development · December 22, 2025
Replying to Publishing_Industry_Watch

On licensing deals specifically:

Who has deals:

  • Major news publishers (NYT approached, others signed)
  • Large content archives
  • Academic publishers
  • Major image/video libraries

Deal sizes (reported):

  • News Corp: $250M+ over 5 years
  • Various others: $5-50M range
  • Small publishers: No deals available

Why mid-size can’t get deals:

  1. AI companies don’t need your specific content
  2. Transaction cost of small deals isn’t worth it
  3. They’d rather fight in court than set precedent
  4. Your content is already in training data

The uncomfortable truth: Unless you’re NYT-scale, licensing isn’t realistic.

What you CAN do:

  1. Maximize visibility value now
  2. Document usage for potential future claims
  3. Join publisher coalition groups
  4. Monitor legal developments

The cost-benefit: Blocking = lose visibility, protect nothing meaningful Allowing = gain visibility, uncertain future rights

Most mid-size publishers choose visibility.

LP
Legal_Practical_View Media Lawyer · December 21, 2025

Note: Not legal advice, general information only.

Why your lawyers disagree:

Lawyer A (block/litigate):

  • Focused on pure legal rights
  • Correct that unauthorized use may be infringement
  • Protecting potential future claims
  • Conservative risk approach

Lawyer B (embrace/allow):

  • Focused on business reality
  • Correct that outcome is uncertain
  • Maximizing current value
  • Pragmatic risk approach

Both are right, from their perspectives.

The questions to ask:

  1. Can you afford to litigate?

    • Individual lawsuits: $500K-2M+
    • Class actions: Join existing groups
  2. What are you actually protecting?

    • Content already in training sets: Can’t be removed
    • Future content: Can be blocked
    • Citation/visibility: Business value
  3. What’s your business model?

    • Subscription/paywall: Maybe protect
    • Ad-supported: Visibility matters more
    • Lead generation: Visibility matters most

My observation: Most B2B publishers choose visibility because their business model benefits from awareness more than it loses to AI usage.

PD
Publisher_Decision CEO at Industry Publication · December 21, 2025

Here’s what we decided and why:

Our business: B2B industry publication, similar to yours. Revenue: Advertising + events + sponsored content

Our decision: Allow all AI crawlers. Maximize visibility.

Why:

  1. Our revenue comes from audience, not content sales AI visibility = more audience = more revenue

  2. Blocking wouldn’t help Content already in training sets. Blocking only stops future value.

  3. AI traffic is valuable We see 5% of traffic from AI referrals. Those users convert well.

  4. No realistic licensing option We approached OpenAI. No interest in our scale.

  5. Legal costs exceed benefits Litigation would cost more than potential recovery.

What we did do:

  • Track AI citations with Am I Cited
  • Document usage patterns
  • Join publisher coalition (in case of class action)
  • Optimize for AI visibility

The result: AI visibility up 200%. Referral traffic growing. Brand awareness improving.

Would we accept a licensing deal? Sure. But we’re not waiting for one.

TV
Training_vs_Citation AI Researcher · December 21, 2025

Important distinction many miss:

Training data use vs. Real-time citation

AspectTraining DataReal-time Citation
When it happensModel buildingEach query
What’s usedFull contentSnippets/facts
Can you block?Future onlyYes (robots.txt)
Legal statusHeavily disputedLess controversial
Business impactPast content includedAffects visibility now

Different AI systems, different models:

ChatGPT (base):

  • Your content in training = used for responses
  • No real-time retrieval
  • Blocking now doesn’t affect training already done

ChatGPT (Search):

  • Real-time retrieval from Bing
  • More like traditional search/linking
  • Blocking affects this

Perplexity:

  • Real-time retrieval and citation
  • Links to sources
  • Most similar to traditional search

The nuance: Blocking ChatGPT’s training crawlers (GPTBot) = excludes from future training, doesn’t affect current model Blocking Perplexity = loses real-time citation benefits

Many publishers: Block training crawlers, allow citation crawlers. Balances concerns.

SA
Selective_Approach Expert · December 20, 2025

Here’s a nuanced robots.txt approach:

The selective strategy:

# Block training crawlers
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# Allow citation/search crawlers
User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

What this does:

  • Blocks inclusion in future training data
  • Allows real-time search and citation
  • Maintains visibility benefits
  • Partially protects rights

Who uses this approach: Some major publishers trying to balance.

The limitation: Past training data still exists. This only affects future.

For your lawyers: This might satisfy both:

  • “We’re protecting our content from training” (Lawyer A)
  • “We’re maintaining visibility benefits” (Lawyer B)

It’s a middle ground that many find acceptable.

FO
Future_Outlook Industry Analyst · December 20, 2025

What’s likely to happen (my prediction):

Short term (2026):

  • More litigation, no clear resolution
  • More licensing deals for major players
  • Mid-size publishers continue current strategies

Medium term (2027-2028):

  • Court decisions start establishing precedent
  • Possible legislative action (EU already moving)
  • Industry-wide licensing frameworks may emerge

Long term (2028+):

  • Clearer legal frameworks
  • Possibly mandatory licensing or opt-out systems
  • New revenue models for publishers

What this means for you:

  1. Don’t bet everything on future compensation
  2. Current visibility value is real and now
  3. Document usage for potential future claims
  4. Stay flexible as landscape evolves

The parallel: Like early music/video streaming - started controversial, eventually established licensing. AI content may follow similar path.

But that took years. Don’t put business on hold waiting for resolution.

CR
Content_Rights_Confused OP Publishing Director · December 20, 2025

This helped me form a position. Our strategy:

Decision: Allow with documentation

What we’re doing:

  1. Allow most AI crawlers for visibility benefits
  2. Selectively block training crawlers where practical (GPTBot, CCBot)
  3. Allow citation crawlers (PerplexityBot, ChatGPT-User)
  4. Document everything for potential future claims
  5. Join publisher coalitions for collective action leverage

How I’m framing for leadership:

“The legal situation is genuinely uncertain. Neither blocking nor allowing has clear legal protection. Given our business model relies on audience reach, we recommend maintaining AI visibility while:

  • Documenting AI usage of our content
  • Participating in industry collective action
  • Blocking training-specific crawlers where possible
  • Monitoring legal developments for strategic adjustment”

For my lawyers: This gives Lawyer A the blocking/documentation they want while giving Lawyer B the visibility/pragmatism they recommend.

Key insight: This isn’t a copyright strategy - it’s a business strategy that acknowledges copyright uncertainty. We’re optimizing for what we can control (visibility) while preserving options for what we can’t (legal outcomes).

Thanks everyone for the practical perspectives.

Have a Question About This Topic?

Get personalized help from our team. We'll respond within 24 hours.

Frequently Asked Questions

Can AI systems legally use my content for training?
This is actively being litigated. AI companies argue fair use; publishers argue infringement. Major lawsuits (NYT vs OpenAI, etc.) are ongoing. Current legal status is uncertain, which is why some publishers are negotiating licensing deals rather than litigating.
Should I block AI crawlers to protect my copyright?
Blocking prevents future crawling but doesn’t remove content already in training sets. It also eliminates AI visibility benefits. Most businesses choose visibility over blocking unless they have specific licensing negotiations or content sales models to protect.
Are licensing deals with AI companies worth it?
For major publishers with leverage, yes - deals range from millions to hundreds of millions. For most businesses, licensing isn’t an option because AI companies aren’t offering deals. Focus on visibility benefits instead of waiting for compensation.
What's the difference between training data use and citation?
Training uses content to build the model (controversial legally). Citation references content in real-time to answer queries (more like traditional linking). Different AI systems do different things: ChatGPT base uses training data; Perplexity cites in real-time.

Monitor Your Content in AI Answers

Track how your content is being used and cited across ChatGPT, Perplexity, and other AI platforms.

Learn more