Merch Drops7 min readMarch 24, 2026

'Sam Altman Owes Me Money' and the AI Training Data Debate

The meme that became a movement — here's why the joke about Sam Altman owing you money is actually about something deeply serious: who owns the data that trains AI, and whether the people who created it deserve a cut.

The Joke That Hit Too Close to Home

If you've spent any time on tech Twitter (or Bluesky, or Mastodon — pick your doomer platform), you've seen it. Someone posts a screenshot of ChatGPT answering a question, and the reply is always the same: "Sam Altman owes me money."

It's funny because it's true.

Every line of code you wrote, every Stack Overflow answer you agonizingly crafted at 2am, every blog post explaining why your obscure bug was actually a semicolon issue — all of it trained the models that are now worth billions to OpenAI, Anthropic, Google, and every other AI lab cashing in on the "intelligence explosion."

And you got a free tier with rate limiting. Cool.

This is the core tension behind the joke, and it's one of the most consequential debates in tech right now: who actually owns the data that trains AI, and do the people who created it have any claim to the value extracted from it?

Training Data: The Raw Material of Intelligence

Let's get technical for a second, because the economics here are wild.

Large language models are essentially compression algorithms trained on astronomical amounts of human-generated text. The Common Crawl corpus alone is estimated to be over 100TB of data. GitHub has billions of lines of public code. Reddit has virtually every tech discussion you've ever Googled at 11pm.

None of this was licensed in any meaningful way. OpenAI's own training data disclosures are famously opaque. Anthropic trains on "a variety of publicly available internet sources." Cool, cool.

You wrote it. They trained on it. They're worth $60B+.
You: *checks bank account* Yep, still $0.

The legal framework hasn't caught up. In the US, fair use doctrine has been interpreted generously to favor AI companies. Courts are still figuring out whether training on copyrighted text constitutes infringement. The EU's AI Act made some noises about transparency requirements, but the actual enforcement mechanisms are still being hammered out.

Meanwhile, the creators — developers, writers, artists, everyone — are getting exactly nothing.

The Token Economy and Your Uncompensated Labor

Here's where it gets genuinely absurd when you think about it from a developer perspective.

Every token you generate with your labor — every console.log, every React component you copy-pasted from a tutorial, every Docker compose file you debugged for three hours — all of it is now context that helps the next user's prompt work better.

The models get better. The companies get richer. The flywheel spins.

And sure, you got a useful tool in return. Nobody's claiming Claude or GPT isn't genuinely useful. The argument isn't that the tools are worthless — it's that the value exchange is completely one-sided.

This is why our Got Tokens? shirt resonates with developers. It's not just a clever wordplay on GPT's token-based pricing. It's a genuine question: did you get tokens in return for the tokens you created?

You generated: ~10,000 tokens of Stack Overflow answers, blog posts, and code
OpenAI extracted: $0.003 in training value (rough estimate)
You received: One (1) "you’re welcome" from the AI

The Open Source Precedent (And Its Limits)

The tech industry has been here before, actually. Remember when everyone was up in arms about "open source" and whether companies could commercialize open-source code without contributing back?

We built licenses to solve that problem. GPL, MIT, Apache — these created legal frameworks for governing how code could be used and whether derivative works needed to be shared.

AI training data is the same problem, except:

  1. There's no license
  2. There's no attribution requirement
  3. There's no copyleft mechanism forcing companies to share improvements

The open-source movement succeeded because developers cared about the ethics and the community built norms around reciprocity. That hasn't happened yet in AI because the speed of development has outpaced the development of norms.

Some researchers are trying. The AI_Data_Gov project is tracking which companies used which datasets. Some artists are watermarking their work. A few lawsuits are inching through the courts.

But fundamentally, we're in a Wild West moment where the ethical norms haven't caught up with the technology.

What Actually Has to Change

Look, nobody's expecting OpenAI to cut a check to every developer whose code helped train GPT-4. The logistics of that would be a nightmare. But there are structural changes that could make this more equitable:

Licensing frameworks for training data. Just like Creative Commons gave content creators options beyond "all rights reserved" or "public domain," we need licensing frameworks that let creators specify how their work can be used for AI training. Some creators might opt in for free. Some might not. But the option should exist.

Compensation mechanisms. A micro-royalty system isn't technically impossible. Every time a model is fine-tuned on a specific corpus, the contributors to that corpus get a tiny cut. This is how Spotify pays artists — per-stream, automatically.

Opt-out respect. Currently, the burden is on creators to opt out of having their data used. It should be opt in. The default should be respect for creators who don't want their work used.

Why the Meme Matters

So yeah. "Sam Altman owes me money."

It's a joke. But underneath the joke is a real and growing sense among developers that something is deeply wrong with how AI value is being extracted from the people who built the internet in the first place.

The meme persists because it names something true: the people building the tools are not the people capturing the value. And until that changes, we'll keep wearing shirts that say things like Sam Altman Owes Me Money — not because we're owed a specific dollar amount, but because the principle of the thing matters.

You wrote the code. You answered the questions. You created the discourse that made these models smart.

Maybe, just maybe, you deserve more than a "you're welcome."

In the meantime, at least you can buy the shirt. And if you've got hot takes burning a hole in your brain, our Trained Deez Nuts tee is for you. Some jokes are too good not to wear on your chest.

The Bottom Line

The AI training data debate isn't going away. It's going to get more contentious as models become more capable and more valuable. The legal system will eventually catch up, but legislation moves slower than technology, and by the time the courts rule definitively, the damage to creator trust may already be done.

What we need are new norms, new frameworks, and new ways of thinking about who the real stakeholders in the AI revolution actually are.

Spoiler: it's not just the VCs.

Until then, keep writing code, keep posting hot takes, and keep wearing shirts that say what everyone's thinking.

Sam Altman probably isn't going to cut you a check.

But at least you can look good while waiting for the revolution.

Mentioned in This Post

Keep Reading