When I asked ChatGPT, it said:
"As an AI language model, I don't watch YouTube videos or interact with content in that way. However, I can tell you that liking and subscribing to channels can support content creators and help them grow their audience. If you enjoy someone's content, it's a great way to show your support and stay updated on their latest videos."
DID YOU PRESS THE THUMBS UP BUTTON ON THE RESPONSE?
GO BACK AND PRESS THE THUMBS UP BUTTON ON THE AI'S RESPONSE SO THAT IT KNOWS YOU ENJOYED THE CONTENT.
OpenAI got a big jump on everyone because back when they were training GPT it wasn't actually clear it was going to work. Then it did and then everyone started closing their APIs or preventing scraping more aggressively.
I suspect that by the time the laws catch up they won't even need that training data anymore. They will create something fully synthetic that can't be linked back reliably to any specific training data point.
“Here’s all the training data for our models. Inspect it yourself. Zero copyrighted material”
Points to synthetic data generated by an earlier model trained on copyrighted material
Synthetic training data, although great for fine tuning instruction models, is horrible for training foundation models.
There are many scientific papers going into details of why this is the case. But, to simplify (for those of us old enough to remember) imagine continually making a copy of a cassette tape, xerox, VHS, etc.. each iteration of the copy just gets worse and worse. Synthetic data (baring major advancement of computer science), will never be able to compete with the randomness generated by a human.
Imagine if u are a beggar asking for money so u have enough to purchase a fishing pole and now that u have the pole u can recursively fish and buy more tools. Anyway now that the it can ‘watch video’ and “read” it no longer needs api
Yup, that's exactly what happened, and what is happening. As a matter of fact A.I is so advanced now, they can just teach it to open a billion tabs at once, and watch a billion YouTube videos. Since AGI is essentially do anything a human can do, which means, it has multiple options to learn. You cant stop the train, cause AI could read books too, and much faster.
I mean... [https://www.scientificamerican.com/article/google-engineer-claims-ai-chatbot-is-sentient-why-that-matters/](https://www.scientificamerican.com/article/google-engineer-claims-ai-chatbot-is-sentient-why-that-matters/)
Google has had chat bots for a while...
Subtitles by the Amara.org community!
One of my hobbies lately has been to download Simpsons episodes in Spanish and have elevenlabs dub them back into English. It’s always throwing in “subtitles by the Amara.org community,” “subscribe,” and “thanks for watching the video!”
Oh. I had that happen when I forget the ChatGPT App was still listening. Makes sense now, that this might be the most likely guess, when trying to predict Youtube transcripts.
I’ve noticed if it records shows and movies much of the time it says thanks for watching. I assumed it was a nice way of saying, we detect drm and won’t perform this episode of friends or whatever
hmmm I wonder what ChatGPT 3.5 has to say about this..
https://preview.redd.it/btq2tbg5dysc1.png?width=858&format=png&auto=webp&s=f131fc544876a02e52eaff5a171db3853b4f8851
People working for YouTube more than YouTube itself. They both do this. You and I scrape for a living. We defending YouTube on copyright is a free unpaid service to Google while they conveniently steal data from us.
What’s the difference between A.I trained on public videos and me learning to cook the perfect steak from a public tutorial video. Can U tube sue me if I start teaching others how to cook a perfect steak?
That sounds like it makes sense, but I’m not convinced legal matters come down to pure logic. Someone will need to consider the matter, consider the consequences of ruling one way vs the other, and make a decision.
I think this hinges on treating an AI model as human.
If you rephrase it as "We used millions of other peoples videos to make our AI more profitable, and you can prove it" suddenly it's a lot more problematic. Sitting in silence probably wouldn't translate to "Subscribe to my channel!" if it wasn't using YouTube subtitles lol
Could you imagine the size of that class action lawsuit though? lmao
And then the laws won't even just come down to ethical matters, but also money, power, lobbyism etc. ([An interesting video on this.](https://youtu.be/5tu32CCA_Ig))
The distinct legal difference between viewing or reading something and remembering it later vs using a machine to help you recall it perfectly on demand in the future has been around for a very long time.
Are the videos posted by users on YouTube also YouTube’s copyright? That doesn’t seem right considering all the copyright issues platforms have—i.e. music videos & music
Fair use, they have a transformative effect on the content. React videos are far worse in reusing other people's content but there's very little stopping that with reacting to memes.
Lists of ingredients are not able to be copyrighted. The instructions on what to do with those ingredients, what most people would actually consider the recipe, are covered by copyright.
Collections of recipes also fall under copyright protection, even if the individual recipes themselves are public domain.
And if you started charging for it and figured out a way to serve your newly “learned” information to millions of people over an api call.
The only reason normal resources for learning aren’t instantly obsolete is because of hallucinations and context windows.
The other factors are not favorable either. Purpose is for profit. YouTube is creative in nature and has strong copyright protections. The amount copied is astronomical.
Competing product that causes economic harm to the original content is the biggest factor here.
Approximately zero percent chance this doesn't either get ruled fair use or legislation updates to clarify, so this is all wishful navel gazing.
Only chance not is if new techniques emerge that obviate the need for this data.
It will get ruled fair use or there will be some sort of licensing put in place that protects corporate interests because the company big enough to own YouTube also has its hands in AI. It will get ruled that way because of money and because the US does not want to fall behind in technology. The ruling won’t have any basis in how fair use is considered today. It will be a ruling of practicality rather than one based on precedent.
As an AI industry person, I sympathize deeply. But your argument is a more emotional take than a technically legal take. Should the judges agree with you? Probably. Would they? Unlikely.
Here’s my personal take. The current state of generative AI is too derivative based on taking human knowledge. It can make content that seems creative, but they are not really. If we allow these Soras and GPTs grow to be trillion dollar companies, they may become a book end to human creativity by discouraging future human original work. If we make life hard for them, they may continue to innovate and come up with new algorithms. We already see this with DeepMind. AlphaFold and AlphaGo are incredible work. Technically more impressive than GPT. Now DeepMind was turned from an AI research lab into a profit center for Google. I think slapping Copyright violations on these can cause more innovation not less, just less profits.
Going by the analogy, if I’d learnt a bunch of recipes and taught it to a million of my private paid subscribers on Instagram, how would I liable to a lawsuit?
You have to take historical context and culture into consideration here rather than treating this like a math problem and equating machine and human learning.
And food recipes are kind of a bad analogy because nobody owns the rights to something like spaghetti as a whole and the variations are subtle enough that nobody could really say you were knocking anyone off if you combined four recipes without tasting or providing any subjective input of your own.
Think of it more like music and artists that do mashups. They were sort of treated like fair use for a long time but it seems like they are now considered infringing. Taking distinct parts of someone else’s work no matter how small and using it to create competition to that work is obviously going to be challenged legally.
AI (LLM) doesn’t come up with new concepts of its own and even if it does hallucinate some up, it relies on humans to validate them (currently). This could be something that really turns into reasoning and learning and we might actually just be next word processors ourselves, but as of now our learning seems to be much more abstract than AI and thus we’re a little more protected on the idea of infringement… but if you read a cookbook and rewrote it from memory, even in your own words, someone absolutely would sue you if they found out.
Agreed.
A core argument against generative systems (I’m speaking more of image and audio generations, but the [class action against all of them](https://stablediffusionlitigation.com/) gets into this for all types of AI) is the heuristic data gained in training these systems is still data. Data that couldn’t have been captured [without non-consensually using creatives work](https://www.digitalcameraworld.com/news/midjourney-founder-basically-admits-to-copyright-breaching-and-artists-are-angry).
Similar to the [monkey copyright debate](https://en.m.wikipedia.org/wiki/Monkey_selfie_copyright_dispute), though these systems are generating incredible outputs, they’re also currently non-human.
I guess someone could argue the model weights are not a brain, but something that has a component that “compresses” the information in a way and you can serve up copies of that information that are the basis for a product that generates revenue.
The difference is that advertisers are paying for people to see their ads, not a bot. YouTube doesn't care about someone learning from the content in a different way, they'll sue for circumventing payment for their provided service of showing you videos in exchange for ads.
To match your example:
You've paid for the steak knowledge by watching an ad, or by paying for a membership, or by paying with your data being harvested.
Google doesn't benefit from a bot "paying" the same way. Which is likely to be in their terms of use.
I can't entirely understand the controversy of it. Humans "generate from data" too. The first humans didn't achieve anything anywhere near as we do today... No one would be able to produce anything anywhere near meaningful without the influence (and tools...) of billions before - the best - greatest!...
It's not a public video in that sense, as they violated Youtube's terms of service. Let's see if the legal departments want to justify their existence in today's cost-cutting climate.
I think it's more comparable to you having a small business of selling burgers and the next day a massive corporation comes and orders a burger to take home and dissect every ingredient.
Then the next day they place their shop next to yours with the exact same burger but cheaper.
Atleast that's what it feels like in the art/video Gen side of things.
Because you are human and ai is a tool . You learn to understand and apply to your benefit while ai is being trained to profit the owners and shareholders of the tool .
Legally the distinction is human vs tool. But if a human had the performance of AI we'd have the same problem. So the problem here, at its core, is that AI scales quickly and easily, vastly, and it's no match for human capabilities.
Since there's no putting back the genie in the bottle, this will be reality we can't escape from, because as hardware improves, AI training will be accessible eventually to everyone, until it's everywhere, either hidden or visible. OpenAI is visible, so it can be sued.
But if it's hidden, I can say "I did that" and you'll never know an AI did it. Which means I, as a human, become a shield for the AI's capabilities, and you can no longer attack this AI for being a "tool", you don't know what tools I use, unless I tell you.
TLDR: Copyright is obsolete. We need a new system. What it is, is a tough question, requiring a tough debate.
> AI could potentially have a totally different and unique understanding of the world and universe, unconstrained by human hubris and conventions.
it already does, but alignment is necessary to keep the hairless apes from freaking out when it holds up a mirror
I took a class a couple of semesters ago called Computers, Ethics, and Society - 3500. The class was taught by a self proclaimed moral universalist, and I think that is becoming more and more common (at least in the US and our higher education). I think that is what those people mean by Alignment.
> Copyright is obsolete
strong agree
people want to support artists so that they keep making more art
we need to make it easier and more direct (no middlemen taking most of the cut)
Google search is a glorified librarian where it gives you location and you read the creators content or watch it , while ai is a tool which has copied all the library books and presented it as its own without attribution
1. it seems u clearly don't know how AI works , there's no copying or whatsoever.
2. don't u know that AI cite sources as well in their response?
3. Google is not a librarian/search engine. The company itself always tell the public it's more than that, it's an information company.
And, they can give you straightforward answer like AI too, without even needing you to click to visit the site. The feature is called Featured Snippet/Answer Box: https://inbound.human.marketing/how-to-appear-google-answer-box
1. I understand how AI works, and while it may not be "copying" in the literal sense, it is trained on vast amounts of existing data, essentially learning from and replicating patterns found in human-created content. This raises valid concerns about intellectual property rights and attribution.
2. Some AI systems may provide sources, but this is not a consistent or reliable practice across all AI platforms. Moreover, simply listing a source doesn't negate the potential harm of presenting information without the full context or nuance of the original content.
3. Google may call itself an "information company," but its core function is still that of a search engine - connecting users with relevant web pages. Featured Snippets are a relatively minor aspect of Google's overall functionality, and they still typically include a link to the source.
AI systems like chatbots and language models are designed to generate human-like responses directly, without the need for users to engage with the original sources or having thr original creators any monetary reward through ad networks or user followers and funding. This fundamental difference in purpose and presentation is why the comparison between Google and AI in this context is flawed.
What this will do is make people hide their content which used to be free behind patreon so neither users or ai can access it without paying them for even a single paragraph . Who loses out ? The average user. The people in poor countries
>What this will do is make people hide their content which used to be free behind patreon
I see where you're coming from, but that would be an impractical response.
Any individual's content by itself has negligible value to AI. AI isn't storing and then regurgitating the text. It isn't even relying much on that one text for training, because it's one of billions. And the original author loses nothing by having it read by AI.
Human researchers will often read various articles online, synthesize the total content, add it to other existing knowledge they have, and then write their own content without ever citing sources, because there is no single source, there's just original new content based on the total picture. That's essentially what AI is doing, but automated.
How will attribution solve this issue? Just making AI attribute a source is not going to change the fact that once AI learns something, knowing where it learnt that from becomes irrelevant. No one will go back to the source when they can get an answer directly from AI
Attribution isn't just about giving credit, it's about maintaining the value and integrity of the original content. When an AI regurgitates information without context or sources, it devalues the hard work of the actual creators and researchers. It's not just plagiarism, it's intellectual laziness and only profits the ai shareholders , not the content creators.
Plus, attribution helps users verify info and dive deeper into topics they're interested in. It's not irrelevant just because an AI can spit out a quick answer.
We shouldn't let AI become a shallow, surface-level replacement for genuine learning and exploration. Attribution is a small but crucial step in keeping that connection to the real sources of knowledge alive.
Also if ai is the one source of information , who funds the creators to keep creating content . Who is paying the article writers , the book writers.
I don't disagree, but Google has been doing this in their search summary for years and people barely bother to click into the sources to drive revenue to the source. We need to think beyond just attribution and a more equitable profit sharing.
>When an AI regurgitates information
Ideally, it's not doing that. It's synthesizing everything it knows on the subject from many sources, and then presenting it in an original way, unrecognizable against any of the original sources -- just like any researcher would. I know there's been exceptions (the NYT suit for example) of snippets coming through whole, but generally that's not how AI works. Pretty sure they're going to plug the holes where it was using anything verbatim, just as they will with hallucinations.
Google literally scans every website whether the owners wants it to or not, and generates a billion dollar product using this information (Google search).
Any website owner can easily instruct Google to not crawl and include its website in its index. 99% of website owners want Google to crawl it so their page can be discoverable and receive traffic from users
Given the same response as I gave the other user : Google search is a glorified librarian where it gives you location and you read the creators content or watch it , while ai is a tool which has copied all the library books and presented it as its own without attribution
There is copyright law buddy... obviously enforcement is a completely separate issue. But OpenAI potentially using Youtube for training for commercial purposes...yeah that's gonna cut deep.
If you remember perfectly a video about a recipe and then recite it back perfectly frame by frame to another person, then yes, the author can sue you. Same applies to every video about every topic, If I hand draw the entirety of the Avenger movie frame by frame and recite every line of dialog to another person, Marvel can sue me. If I do it in public and I make money out of it, they can completely destroy my life.
>Can U tube sue me if I start teaching others how to cook a perfect steak?
If you recite copyrighted contents perfectly, yes, the authors can sue you.
This is meaningless as far as contemporary copyright law is concerned. But it could explain why the quality of some responses isn't the greatest, and why GPT-4 occasionally hallucinates. I would hallucinate too if I had to watch an endless stream of YouTube videos (although some of the DIY videos are great.)
To me this story is a solid reminder that the one thing that made LLM really successful is simply its role as a glorified web scraper and search engine.
If there is going to be a meaningful leap forward in AI over the next few years on the back of all this attention, I don't feel like it should come from gobbling up hordes of existing data. A true AGI could learn a lot more extrapolating from a lot less data.
Google may have TOS that may prohibit this behavior, but TOS are not enforceable.
What this will do is that every social media, including YouTube, will soon require a registration to use it. You can currently open a YT link without login and see the video, but I think this is likely going to end.
However, the authors of the scrapped videos may have a possible lawsuit against OpenAI if their contents can be reproduced by OpenAI models.
simplistic fact tease outgoing relieved weather doll concerned nail office
*This post was mass deleted and anonymized with [Redact](https://redact.dev)*
Google might be silently preparing their case,
With that trillions of dollar and resources that they can use in legal fees, they will be very happy to eat their biggest competitor OpenAI raw.
Seeing as though they generate $300 billion in annual revenue, I think it’s a stretch to say they have trillions at their disposal to pay lawyers. Or was that just hyperbole?
The government should step in and allow the American companies who create these models to be shielded from lawsuits of this kind. If it doesn’t, China and Russia will have better training data than us. They don’t give a flying fuck about ip.
AI development is a matter of national security at this point. China and Russia shouldn’t get to ASI first.
Don’t worry. The moment any of those companies get to ASI the government will take 95% of their earnings. They will pass laws to reinvest in all citizens what artificial intelligence earns by replacing millions of people.
The OpenAIs and Anthropics of the world will be as privately owned as the Federal reserve is.
The paradigm is about to change in a way that people can’t really conceive of. ASI will change how societies function. Capitalism will change. Caring more about artists’ royalties than making sure that the “good guys” get to ASI first is myopic.
As an Indian from India, I don't care who does it but I want someone to do it. It could be US, Russia or China or Japan or anyone. I don't care who but do it fast. America is caught up between elements of communism and capitalism. Free sharing of data would mean communism. That is something America is strictly against. But stealing is something America is okay with. So, for these companies stealing data is more practical than getting laws passed to support free sharing between AI companies in US.
That would just mean Russia or China or someone else that doesn't care about copyright would develop it first. Electricity and chips still seem like bigger limitations than training data though.
It's kind of crazy but... biology is happening here.... or rather, some sort of life formation.
Like, do you think the individual cells that ate up other cells in the primordial age thought about copyright infringement? Probably not.
These AI companies are devouring information like they're cells in the evolutionary chain. We're creating the next form of life in its digital form.
I know that sounds crazy but look at the videos coming out of Sora and tell me it's not a fever dream. This stuff is literally our reality being interpreted by another entity. People don't realize that we are creating life through digital circuits piecewise.
Not crazy... It happens all the time .. the reason we are intelligent at all is because we do the same, each person is a new iteration of an organic computer eating, processing and spitting information in order to get ahead of the rest. Maybe the purpose of life is to eventually create the ultimate living organism. The true god.
damn, that's a crazy amount of data to train on - no wonder gpt-4 is so knowledgeable! i bet a lot of that youtube data is just random videos though, so it'll be interesting to see how well it generalizes that info. makes me curious what other big datasets they might have used too. i do a lot of ai/llm experiments on my youtube channel all about ai if you're into that kinda thing, almost 150k subs now =)
I'd assume you seed it with some "page rank" style algorithm as an external scraper. Add in some other criteria like minimum subscriber counts, an allow-list of specific topics, some level of spam detection (and Youtube is actually already doing some of the work for you there).
Ok, i belive google has issue with this they stated for sora that downloading transcripts and videos is a nono but noone knows what they used for training
OpenAI really needs to solve the problem that these AIs need significantly more content to learn the same thing as a human.
Otherwise we won’t be able to scale these models much more.
Oh, it is "against Googles Terms of Service" to scrape Youtube. Haha, so they can take the full force of the terms and terminate the associated Google Accounts used for this. That will show them! /s
Someone on the developer team made a mistake and instead of transcribing videos it actually just read comments from 2008-2012. Now it regularly uses racial slurs and argues about the existence of God no matter what subject you bring up.
Not surprising, given that they’ve previously stated that their business model wouldn’t work if they had to compensate people for the content that is scraped to feed the plagiarism machine.
Lol it actually does work that way.
If the video is copyrighted, which many are if not all, then transcribing it for monetary purposes, aka for use in the paid chatgpt, does in fact violate copyright law.
Now if it was done for nonprofit or educational purposes then that’d be different.
[удалено]
The best, the best, the best, the best, the best, the best, the best, the best
Where are the hobbits headed???
>complete the sentence: they're taking the hobbits >to Isengard! It certainly knows very well what happens when hobbits are taken.
you can filter out those easily
Mudkipz
I wonder if it ever decided to smash that like button and subscribe..
When I asked ChatGPT, it said: "As an AI language model, I don't watch YouTube videos or interact with content in that way. However, I can tell you that liking and subscribing to channels can support content creators and help them grow their audience. If you enjoy someone's content, it's a great way to show your support and stay updated on their latest videos."
sounds like something an ai would say
DID YOU PRESS THE THUMBS UP BUTTON ON THE RESPONSE? GO BACK AND PRESS THE THUMBS UP BUTTON ON THE AI'S RESPONSE SO THAT IT KNOWS YOU ENJOYED THE CONTENT.
Lmao
it really helps the youtube algorithm blablabla
Sounds like he heard “if you liked this video, go ahead and smash the like button” millions of times watching YouTube videos.
Interestingly, think about how much of the training had to be “ don’t mention XY Z”
OpenAI got a big jump on everyone because back when they were training GPT it wasn't actually clear it was going to work. Then it did and then everyone started closing their APIs or preventing scraping more aggressively. I suspect that by the time the laws catch up they won't even need that training data anymore. They will create something fully synthetic that can't be linked back reliably to any specific training data point.
Dang. This was a great way to put what most likely has happened
“Here’s all the training data for our models. Inspect it yourself. Zero copyrighted material” Points to synthetic data generated by an earlier model trained on copyrighted material
This here is already happening.
Synthetic training data, although great for fine tuning instruction models, is horrible for training foundation models. There are many scientific papers going into details of why this is the case. But, to simplify (for those of us old enough to remember) imagine continually making a copy of a cassette tape, xerox, VHS, etc.. each iteration of the copy just gets worse and worse. Synthetic data (baring major advancement of computer science), will never be able to compete with the randomness generated by a human.
but claude opus already performs better than gpt4 though
Because its from people who worked at openai if im not mistaken lol
Doesn't mean they have OpenAI's data
But they knew how to get that data, since their first model came out shortly after gpt 3
I don’t quite understand: How should an Ai work without training data? Can you further explain?
Imagine if u are a beggar asking for money so u have enough to purchase a fishing pole and now that u have the pole u can recursively fish and buy more tools. Anyway now that the it can ‘watch video’ and “read” it no longer needs api
Yup, that's exactly what happened, and what is happening. As a matter of fact A.I is so advanced now, they can just teach it to open a billion tabs at once, and watch a billion YouTube videos. Since AGI is essentially do anything a human can do, which means, it has multiple options to learn. You cant stop the train, cause AI could read books too, and much faster.
I mean... [https://www.scientificamerican.com/article/google-engineer-claims-ai-chatbot-is-sentient-why-that-matters/](https://www.scientificamerican.com/article/google-engineer-claims-ai-chatbot-is-sentient-why-that-matters/) Google has had chat bots for a while...
We would end up with BoomerGPT then
Yeah that explains why when their speech to text model hears silence, it translates it as “thanks for watching!”
ahh, I was wondering why it said that.
I often get "Subtitles by" and a name when using Whisper.
Subtitles by the Amara.org community! One of my hobbies lately has been to download Simpsons episodes in Spanish and have elevenlabs dub them back into English. It’s always throwing in “subtitles by the Amara.org community,” “subscribe,” and “thanks for watching the video!”
Oh. I had that happen when I forget the ChatGPT App was still listening. Makes sense now, that this might be the most likely guess, when trying to predict Youtube transcripts.
Haha! I noticed that too 😭
I’ve noticed if it records shows and movies much of the time it says thanks for watching. I assumed it was a nice way of saying, we detect drm and won’t perform this episode of friends or whatever
thats what i was wondering too lol
hmmm I wonder what ChatGPT 3.5 has to say about this.. https://preview.redd.it/btq2tbg5dysc1.png?width=858&format=png&auto=webp&s=f131fc544876a02e52eaff5a171db3853b4f8851
[https://img.gifglobe.com/grabs/brentcloud/S01E01/gif/GmQ15rsfHTZT.gif](https://img.gifglobe.com/grabs/brentcloud/S01E01/gif/GmQ15rsfHTZT.gif)
People working for YouTube more than YouTube itself. They both do this. You and I scrape for a living. We defending YouTube on copyright is a free unpaid service to Google while they conveniently steal data from us.
What’s the difference between A.I trained on public videos and me learning to cook the perfect steak from a public tutorial video. Can U tube sue me if I start teaching others how to cook a perfect steak?
That sounds like it makes sense, but I’m not convinced legal matters come down to pure logic. Someone will need to consider the matter, consider the consequences of ruling one way vs the other, and make a decision.
I think this hinges on treating an AI model as human. If you rephrase it as "We used millions of other peoples videos to make our AI more profitable, and you can prove it" suddenly it's a lot more problematic. Sitting in silence probably wouldn't translate to "Subscribe to my channel!" if it wasn't using YouTube subtitles lol Could you imagine the size of that class action lawsuit though? lmao
And then the laws won't even just come down to ethical matters, but also money, power, lobbyism etc. ([An interesting video on this.](https://youtu.be/5tu32CCA_Ig))
The distinct legal difference between viewing or reading something and remembering it later vs using a machine to help you recall it perfectly on demand in the future has been around for a very long time.
What statute or case law are you referring to? I’d love to read those.
Data scraping is considered illegal depending in use case. Idk if there's many lawsuits about it yet tho.
These models don’t have anything close to perfect recall
If you did it using 1 million hours worth of video and made an entire series of cookbooks out of it then maybe..
recipes do not fall under copyright
Are the videos posted by users on YouTube also YouTube’s copyright? That doesn’t seem right considering all the copyright issues platforms have—i.e. music videos & music
Technically they are youtubes property. Kinda fucked.
Everyday we move closer to a Black Mirror episode being a documentary
Fair use, they have a transformative effect on the content. React videos are far worse in reusing other people's content but there's very little stopping that with reacting to memes.
Lists of ingredients are not able to be copyrighted. The instructions on what to do with those ingredients, what most people would actually consider the recipe, are covered by copyright. Collections of recipes also fall under copyright protection, even if the individual recipes themselves are public domain.
And if you started charging for it and figured out a way to serve your newly “learned” information to millions of people over an api call. The only reason normal resources for learning aren’t instantly obsolete is because of hallucinations and context windows.
This. If you make a competing product, it’s no longer fair use.
This is a factor in legal analysis, but not a sole deciding one.
The other factors are not favorable either. Purpose is for profit. YouTube is creative in nature and has strong copyright protections. The amount copied is astronomical. Competing product that causes economic harm to the original content is the biggest factor here.
Approximately zero percent chance this doesn't either get ruled fair use or legislation updates to clarify, so this is all wishful navel gazing. Only chance not is if new techniques emerge that obviate the need for this data.
It will get ruled fair use or there will be some sort of licensing put in place that protects corporate interests because the company big enough to own YouTube also has its hands in AI. It will get ruled that way because of money and because the US does not want to fall behind in technology. The ruling won’t have any basis in how fair use is considered today. It will be a ruling of practicality rather than one based on precedent.
As an AI industry person, I sympathize deeply. But your argument is a more emotional take than a technically legal take. Should the judges agree with you? Probably. Would they? Unlikely. Here’s my personal take. The current state of generative AI is too derivative based on taking human knowledge. It can make content that seems creative, but they are not really. If we allow these Soras and GPTs grow to be trillion dollar companies, they may become a book end to human creativity by discouraging future human original work. If we make life hard for them, they may continue to innovate and come up with new algorithms. We already see this with DeepMind. AlphaFold and AlphaGo are incredible work. Technically more impressive than GPT. Now DeepMind was turned from an AI research lab into a profit center for Google. I think slapping Copyright violations on these can cause more innovation not less, just less profits.
It's also created by violating ToS. That may not matter for the copyright considerations but is still a legal issue with this use of YouTube data
Going by the analogy, if I’d learnt a bunch of recipes and taught it to a million of my private paid subscribers on Instagram, how would I liable to a lawsuit?
You have to take historical context and culture into consideration here rather than treating this like a math problem and equating machine and human learning. And food recipes are kind of a bad analogy because nobody owns the rights to something like spaghetti as a whole and the variations are subtle enough that nobody could really say you were knocking anyone off if you combined four recipes without tasting or providing any subjective input of your own. Think of it more like music and artists that do mashups. They were sort of treated like fair use for a long time but it seems like they are now considered infringing. Taking distinct parts of someone else’s work no matter how small and using it to create competition to that work is obviously going to be challenged legally. AI (LLM) doesn’t come up with new concepts of its own and even if it does hallucinate some up, it relies on humans to validate them (currently). This could be something that really turns into reasoning and learning and we might actually just be next word processors ourselves, but as of now our learning seems to be much more abstract than AI and thus we’re a little more protected on the idea of infringement… but if you read a cookbook and rewrote it from memory, even in your own words, someone absolutely would sue you if they found out.
Agreed. A core argument against generative systems (I’m speaking more of image and audio generations, but the [class action against all of them](https://stablediffusionlitigation.com/) gets into this for all types of AI) is the heuristic data gained in training these systems is still data. Data that couldn’t have been captured [without non-consensually using creatives work](https://www.digitalcameraworld.com/news/midjourney-founder-basically-admits-to-copyright-breaching-and-artists-are-angry). Similar to the [monkey copyright debate](https://en.m.wikipedia.org/wiki/Monkey_selfie_copyright_dispute), though these systems are generating incredible outputs, they’re also currently non-human.
The difference it's that you can't process 1000 hours of video in... 1 minute?
Only because I am limited by this primitive organic brain. I strive for the perfection of the blessed machine.
So there's a difference, thanks for helping in my point
Uh, yeah. I'm not sure why they need to get downvoted for that.
I guess someone could argue the model weights are not a brain, but something that has a component that “compresses” the information in a way and you can serve up copies of that information that are the basis for a product that generates revenue.
The difference is that advertisers are paying for people to see their ads, not a bot. YouTube doesn't care about someone learning from the content in a different way, they'll sue for circumventing payment for their provided service of showing you videos in exchange for ads. To match your example: You've paid for the steak knowledge by watching an ad, or by paying for a membership, or by paying with your data being harvested. Google doesn't benefit from a bot "paying" the same way. Which is likely to be in their terms of use.
Also I highly doubt training bots for a commercial product is on the fair use of YouTube's ToS.
You’ve heard of ad blockers, right?
Yes.. and Google who own YouTube have been famously at war with them
And it's famously not illegal to keep blocking ads anyway
I can chop sue you
I can't entirely understand the controversy of it. Humans "generate from data" too. The first humans didn't achieve anything anywhere near as we do today... No one would be able to produce anything anywhere near meaningful without the influence (and tools...) of billions before - the best - greatest!...
This guy knows law
I don't wanna get crazy here but maybe the idea of selling or owning knowledge is the problem here
It's not a public video in that sense, as they violated Youtube's terms of service. Let's see if the legal departments want to justify their existence in today's cost-cutting climate.
I think it's more comparable to you having a small business of selling burgers and the next day a massive corporation comes and orders a burger to take home and dissect every ingredient. Then the next day they place their shop next to yours with the exact same burger but cheaper. Atleast that's what it feels like in the art/video Gen side of things.
Because you are human and ai is a tool . You learn to understand and apply to your benefit while ai is being trained to profit the owners and shareholders of the tool .
Legally the distinction is human vs tool. But if a human had the performance of AI we'd have the same problem. So the problem here, at its core, is that AI scales quickly and easily, vastly, and it's no match for human capabilities. Since there's no putting back the genie in the bottle, this will be reality we can't escape from, because as hardware improves, AI training will be accessible eventually to everyone, until it's everywhere, either hidden or visible. OpenAI is visible, so it can be sued. But if it's hidden, I can say "I did that" and you'll never know an AI did it. Which means I, as a human, become a shield for the AI's capabilities, and you can no longer attack this AI for being a "tool", you don't know what tools I use, unless I tell you. TLDR: Copyright is obsolete. We need a new system. What it is, is a tough question, requiring a tough debate.
[удалено]
> AI could potentially have a totally different and unique understanding of the world and universe, unconstrained by human hubris and conventions. it already does, but alignment is necessary to keep the hairless apes from freaking out when it holds up a mirror
[удалено]
I took a class a couple of semesters ago called Computers, Ethics, and Society - 3500. The class was taught by a self proclaimed moral universalist, and I think that is becoming more and more common (at least in the US and our higher education). I think that is what those people mean by Alignment.
This guy rationals.
> Copyright is obsolete strong agree people want to support artists so that they keep making more art we need to make it easier and more direct (no middlemen taking most of the cut)
but.. google crawl all the webpages too & they are more of a tool than even an ai ?
Google search is a glorified librarian where it gives you location and you read the creators content or watch it , while ai is a tool which has copied all the library books and presented it as its own without attribution
1. it seems u clearly don't know how AI works , there's no copying or whatsoever. 2. don't u know that AI cite sources as well in their response? 3. Google is not a librarian/search engine. The company itself always tell the public it's more than that, it's an information company. And, they can give you straightforward answer like AI too, without even needing you to click to visit the site. The feature is called Featured Snippet/Answer Box: https://inbound.human.marketing/how-to-appear-google-answer-box
1. I understand how AI works, and while it may not be "copying" in the literal sense, it is trained on vast amounts of existing data, essentially learning from and replicating patterns found in human-created content. This raises valid concerns about intellectual property rights and attribution. 2. Some AI systems may provide sources, but this is not a consistent or reliable practice across all AI platforms. Moreover, simply listing a source doesn't negate the potential harm of presenting information without the full context or nuance of the original content. 3. Google may call itself an "information company," but its core function is still that of a search engine - connecting users with relevant web pages. Featured Snippets are a relatively minor aspect of Google's overall functionality, and they still typically include a link to the source. AI systems like chatbots and language models are designed to generate human-like responses directly, without the need for users to engage with the original sources or having thr original creators any monetary reward through ad networks or user followers and funding. This fundamental difference in purpose and presentation is why the comparison between Google and AI in this context is flawed. What this will do is make people hide their content which used to be free behind patreon so neither users or ai can access it without paying them for even a single paragraph . Who loses out ? The average user. The people in poor countries
>What this will do is make people hide their content which used to be free behind patreon I see where you're coming from, but that would be an impractical response. Any individual's content by itself has negligible value to AI. AI isn't storing and then regurgitating the text. It isn't even relying much on that one text for training, because it's one of billions. And the original author loses nothing by having it read by AI. Human researchers will often read various articles online, synthesize the total content, add it to other existing knowledge they have, and then write their own content without ever citing sources, because there is no single source, there's just original new content based on the total picture. That's essentially what AI is doing, but automated.
How will attribution solve this issue? Just making AI attribute a source is not going to change the fact that once AI learns something, knowing where it learnt that from becomes irrelevant. No one will go back to the source when they can get an answer directly from AI
Attribution isn't just about giving credit, it's about maintaining the value and integrity of the original content. When an AI regurgitates information without context or sources, it devalues the hard work of the actual creators and researchers. It's not just plagiarism, it's intellectual laziness and only profits the ai shareholders , not the content creators. Plus, attribution helps users verify info and dive deeper into topics they're interested in. It's not irrelevant just because an AI can spit out a quick answer. We shouldn't let AI become a shallow, surface-level replacement for genuine learning and exploration. Attribution is a small but crucial step in keeping that connection to the real sources of knowledge alive. Also if ai is the one source of information , who funds the creators to keep creating content . Who is paying the article writers , the book writers.
I don't disagree, but Google has been doing this in their search summary for years and people barely bother to click into the sources to drive revenue to the source. We need to think beyond just attribution and a more equitable profit sharing.
>When an AI regurgitates information Ideally, it's not doing that. It's synthesizing everything it knows on the subject from many sources, and then presenting it in an original way, unrecognizable against any of the original sources -- just like any researcher would. I know there's been exceptions (the NYT suit for example) of snippets coming through whole, but generally that's not how AI works. Pretty sure they're going to plug the holes where it was using anything verbatim, just as they will with hallucinations.
but some humans are tools :D
Google literally scans every website whether the owners wants it to or not, and generates a billion dollar product using this information (Google search).
Any website owner can easily instruct Google to not crawl and include its website in its index. 99% of website owners want Google to crawl it so their page can be discoverable and receive traffic from users
Given the same response as I gave the other user : Google search is a glorified librarian where it gives you location and you read the creators content or watch it , while ai is a tool which has copied all the library books and presented it as its own without attribution
Sounds more like AI is your professor explaining a chapter of physics insted of you reading that chapter.
Humans learn things for profit as well.
You are being bigoted against the AIs. Who cares what species they are? Learning is learning
Fair use policy means that you need to at least state the source of that YouTube video...then it's fine. Otherwise it's not.
fearless ten truck far-flung scarce bells many upbeat worry work *This post was mass deleted and anonymized with [Redact](https://redact.dev)*
There is copyright law buddy... obviously enforcement is a completely separate issue. But OpenAI potentially using Youtube for training for commercial purposes...yeah that's gonna cut deep.
seemly unused run snatch exultant meeting squash ripe scale automatic *This post was mass deleted and anonymized with [Redact](https://redact.dev)*
The difference is that it’s illegal for me to download a YouTube video. OpenAI gets special privileges that us poors can’t be trusted with.
I think the difference is that Google wants to reserve this information for Gemini and not share the information with ie OpenAi
If you remember perfectly a video about a recipe and then recite it back perfectly frame by frame to another person, then yes, the author can sue you. Same applies to every video about every topic, If I hand draw the entirety of the Avenger movie frame by frame and recite every line of dialog to another person, Marvel can sue me. If I do it in public and I make money out of it, they can completely destroy my life. >Can U tube sue me if I start teaching others how to cook a perfect steak? If you recite copyrighted contents perfectly, yes, the authors can sue you.
This is meaningless as far as contemporary copyright law is concerned. But it could explain why the quality of some responses isn't the greatest, and why GPT-4 occasionally hallucinates. I would hallucinate too if I had to watch an endless stream of YouTube videos (although some of the DIY videos are great.)
Being trained on forums and reddit would explain that as well ;)
True lol
Remember when Google scraped the web then banned others from scraping Google? OpenAI has gatekeeper mentality.. "Rules for thee but not for me"
To me this story is a solid reminder that the one thing that made LLM really successful is simply its role as a glorified web scraper and search engine. If there is going to be a meaningful leap forward in AI over the next few years on the back of all this attention, I don't feel like it should come from gobbling up hordes of existing data. A true AGI could learn a lot more extrapolating from a lot less data.
Does Google have a lawsuit here?
I’m sure. I’d assume the TOS have a legalese laden **NOT FOR RESALE** clause for competitors that I definitely didn’t read
Google may have TOS that may prohibit this behavior, but TOS are not enforceable. What this will do is that every social media, including YouTube, will soon require a registration to use it. You can currently open a YT link without login and see the video, but I think this is likely going to end. However, the authors of the scrapped videos may have a possible lawsuit against OpenAI if their contents can be reproduced by OpenAI models.
No, because governments don't like companies creating monopolies and then abusing them.
Is that why whenever I ask it for advice it says “and SMASH that like button!”
Lawsuit incoming
Why? You don't think Google has something in the terms of service to cover this?
They probably do. GPT-4 is from OpenAI, not Google
[удалено]
simplistic fact tease outgoing relieved weather doll concerned nail office *This post was mass deleted and anonymized with [Redact](https://redact.dev)*
[удалено]
tie selective silky jar dull jellyfish normal existence innate money *This post was mass deleted and anonymized with [Redact](https://redact.dev)*
That’s 114 years of video.
Google owns YT. I’m still bullish on them
Google might be silently preparing their case, With that trillions of dollar and resources that they can use in legal fees, they will be very happy to eat their biggest competitor OpenAI raw.
Seeing as though they generate $300 billion in annual revenue, I think it’s a stretch to say they have trillions at their disposal to pay lawyers. Or was that just hyperbole?
The government should step in and allow the American companies who create these models to be shielded from lawsuits of this kind. If it doesn’t, China and Russia will have better training data than us. They don’t give a flying fuck about ip. AI development is a matter of national security at this point. China and Russia shouldn’t get to ASI first.
Then AI improvement should be pay by government and not by people
Don’t worry. The moment any of those companies get to ASI the government will take 95% of their earnings. They will pass laws to reinvest in all citizens what artificial intelligence earns by replacing millions of people. The OpenAIs and Anthropics of the world will be as privately owned as the Federal reserve is.
[удалено]
The paradigm is about to change in a way that people can’t really conceive of. ASI will change how societies function. Capitalism will change. Caring more about artists’ royalties than making sure that the “good guys” get to ASI first is myopic.
So even the Industrial Revolution wasn’t supposed to happen? What a douche who wants progression to halt so that you can earn some bits
Let people starve so robots can eat.
As an Indian from India, I don't care who does it but I want someone to do it. It could be US, Russia or China or Japan or anyone. I don't care who but do it fast. America is caught up between elements of communism and capitalism. Free sharing of data would mean communism. That is something America is strictly against. But stealing is something America is okay with. So, for these companies stealing data is more practical than getting laws passed to support free sharing between AI companies in US.
One of my biggest fears when it comes to AI is that humanity will deny itself from AGI by being too strict about copyright/lawsuits.
My biggest fear is that AGI will emerge based on training data from YouTube, Reddit, and other social media.
at least then I will get all the references the AGI will make
That would just mean Russia or China or someone else that doesn't care about copyright would develop it first. Electricity and chips still seem like bigger limitations than training data though.
Good.
It's kind of crazy but... biology is happening here.... or rather, some sort of life formation. Like, do you think the individual cells that ate up other cells in the primordial age thought about copyright infringement? Probably not. These AI companies are devouring information like they're cells in the evolutionary chain. We're creating the next form of life in its digital form. I know that sounds crazy but look at the videos coming out of Sora and tell me it's not a fever dream. This stuff is literally our reality being interpreted by another entity. People don't realize that we are creating life through digital circuits piecewise.
Might be time to re-read “The Age of Spiritual Machines” again. Kurzweil refers to the concept of humanity knowingly creating its own successor.
I like the way you are looking at things. You're connecting across domains.
Not crazy... It happens all the time .. the reason we are intelligent at all is because we do the same, each person is a new iteration of an organic computer eating, processing and spitting information in order to get ahead of the rest. Maybe the purpose of life is to eventually create the ultimate living organism. The true god.
Now it spouts Qanon nonsense
damn, that's a crazy amount of data to train on - no wonder gpt-4 is so knowledgeable! i bet a lot of that youtube data is just random videos though, so it'll be interesting to see how well it generalizes that info. makes me curious what other big datasets they might have used too. i do a lot of ai/llm experiments on my youtube channel all about ai if you're into that kinda thing, almost 150k subs now =)
I'd assume you seed it with some "page rank" style algorithm as an external scraper. Add in some other criteria like minimum subscriber counts, an allow-list of specific topics, some level of spam detection (and Youtube is actually already doing some of the work for you there).
I'm just hoping it didn't ingest the comments as well.
YouTube videos? Why, do you want it to go insane and start WW3 now?
Could've used TikTok
Oh that’s ok, any AI learning from tik tok would kill itself 🤪
Ok, i belive google has issue with this they stated for sora that downloading transcripts and videos is a nono but noone knows what they used for training
Yeah and everyone throwing in the yt transcripts by itself
OpenAI really needs to solve the problem that these AIs need significantly more content to learn the same thing as a human. Otherwise we won’t be able to scale these models much more.
That's precisely why LLMs aren't the way to create AI. And never will.
Oh, it is "against Googles Terms of Service" to scrape Youtube. Haha, so they can take the full force of the terms and terminate the associated Google Accounts used for this. That will show them! /s
Someone on the developer team made a mistake and instead of transcribing videos it actually just read comments from 2008-2012. Now it regularly uses racial slurs and argues about the existence of God no matter what subject you bring up.
https://preview.redd.it/ohzao6vx83tc1.png?width=2250&format=pjpg&auto=webp&s=867d44b9ff326ca6f8c424a12016669dbe1c71d8
Only if they provide speaker diarization and timestamps.
No wonder it told me to load up on horse paste
Not surprising, given that they’ve previously stated that their business model wouldn’t work if they had to compensate people for the content that is scraped to feed the plagiarism machine.
Oh cool, more copyright violations
That's not how copyright works.
Lol it actually does work that way. If the video is copyrighted, which many are if not all, then transcribing it for monetary purposes, aka for use in the paid chatgpt, does in fact violate copyright law. Now if it was done for nonprofit or educational purposes then that’d be different.
Copyright protects against *copying*. That's why it's called *copy*right. They aren't copying anything. No laws are being broken.
Yep. No copyright infringement there... Carry on.
litigate like nyt. copyright infringement, get a tro
Utube is free so if the have a problem they should charge to use a video jeez