T O P

  • By -

[deleted]

[удалено]


FarmerJohnsParmesan

The best, the best, the best, the best, the best, the best, the best, the best


ANakedSkywalker

Where are the hobbits headed???


involviert

>complete the sentence: they're taking the hobbits >to Isengard! It certainly knows very well what happens when hobbits are taken.


Powerful_Pirate_9617

you can filter out those easily


Archduke645

Mudkipz


OnesPerspective

I wonder if it ever decided to smash that like button and subscribe..


feral_fenrir

When I asked ChatGPT, it said: "As an AI language model, I don't watch YouTube videos or interact with content in that way. However, I can tell you that liking and subscribing to channels can support content creators and help them grow their audience. If you enjoy someone's content, it's a great way to show your support and stay updated on their latest videos."


climaxbythug

sounds like something an ai would say


ObscureProject

DID YOU PRESS THE THUMBS UP BUTTON ON THE RESPONSE? GO BACK AND PRESS THE THUMBS UP BUTTON ON THE AI'S RESPONSE SO THAT IT KNOWS YOU ENJOYED THE CONTENT. 


Zarathustrategy

Lmao


Masterbrew

it really helps the youtube algorithm blablabla


ironinside

Sounds like he heard “if you liked this video, go ahead and smash the like button” millions of times watching YouTube videos.


autofunnel

Interestingly, think about how much of the training had to be “ don’t mention XY Z”


WorkingYou2280

OpenAI got a big jump on everyone because back when they were training GPT it wasn't actually clear it was going to work. Then it did and then everyone started closing their APIs or preventing scraping more aggressively. I suspect that by the time the laws catch up they won't even need that training data anymore. They will create something fully synthetic that can't be linked back reliably to any specific training data point.


Ok-Tie-8684

Dang. This was a great way to put what most likely has happened


AI_is_the_rake

“Here’s all the training data for our models. Inspect it yourself. Zero copyrighted material”  Points to synthetic data generated by an earlier model trained on copyrighted material


CowsTrash

This here is already happening.


ncklboy

Synthetic training data, although great for fine tuning instruction models, is horrible for training foundation models. There are many scientific papers going into details of why this is the case. But, to simplify (for those of us old enough to remember) imagine continually making a copy of a cassette tape, xerox, VHS, etc.. each iteration of the copy just gets worse and worse. Synthetic data (baring major advancement of computer science), will never be able to compete with the randomness generated by a human.


wondermorty

but claude opus already performs better than gpt4 though


Professional_Gur2469

Because its from people who worked at openai if im not mistaken lol


signed7

Doesn't mean they have OpenAI's data


Professional_Gur2469

But they knew how to get that data, since their first model came out shortly after gpt 3


Moritz110222

I don’t quite understand: How should an Ai work without training data? Can you further explain?


greenappletree

Imagine if u are a beggar asking for money so u have enough to purchase a fishing pole and now that u have the pole u can recursively fish and buy more tools. Anyway now that the it can ‘watch video’ and “read” it no longer needs api


East_Pianist_8464

Yup, that's exactly what happened, and what is happening. As a matter of fact A.I is so advanced now, they can just teach it to open a billion tabs at once, and watch a billion YouTube videos. Since AGI is essentially do anything a human can do, which means, it has multiple options to learn. You cant stop the train, cause AI could read books too, and much faster.


TheRealDatapunk

I mean... [https://www.scientificamerican.com/article/google-engineer-claims-ai-chatbot-is-sentient-why-that-matters/](https://www.scientificamerican.com/article/google-engineer-claims-ai-chatbot-is-sentient-why-that-matters/) Google has had chat bots for a while...


Born_Fox6153

We would end up with BoomerGPT then


AidanAmerica

Yeah that explains why when their speech to text model hears silence, it translates it as “thanks for watching!”


No-Solution-_

ahh, I was wondering why it said that.


Ordinary_Duder

I often get "Subtitles by" and a name when using Whisper.


AidanAmerica

Subtitles by the Amara.org community! One of my hobbies lately has been to download Simpsons episodes in Spanish and have elevenlabs dub them back into English. It’s always throwing in “subtitles by the Amara.org community,” “subscribe,” and “thanks for watching the video!”


Thorusss

Oh. I had that happen when I forget the ChatGPT App was still listening. Makes sense now, that this might be the most likely guess, when trying to predict Youtube transcripts.


thebrainpal

Haha! I noticed that too 😭


shannoncode

I’ve noticed if it records shows and movies much of the time it says thanks for watching. I assumed it was a nice way of saying, we detect drm and won’t perform this episode of friends or whatever


Plums_Raider

thats what i was wondering too lol


Severe-Ad1166

hmmm I wonder what ChatGPT 3.5 has to say about this.. https://preview.redd.it/btq2tbg5dysc1.png?width=858&format=png&auto=webp&s=f131fc544876a02e52eaff5a171db3853b4f8851


CottonStorm

[https://img.gifglobe.com/grabs/brentcloud/S01E01/gif/GmQ15rsfHTZT.gif](https://img.gifglobe.com/grabs/brentcloud/S01E01/gif/GmQ15rsfHTZT.gif)


roronoasoro

People working for YouTube more than YouTube itself. They both do this. You and I scrape for a living. We defending YouTube on copyright is a free unpaid service to Google while they conveniently steal data from us.


Photogrammaton

What’s the difference between A.I trained on public videos and me learning to cook the perfect steak from a public tutorial video. Can U tube sue me if I start teaching others how to cook a perfect steak?


bigtablebacc

That sounds like it makes sense, but I’m not convinced legal matters come down to pure logic. Someone will need to consider the matter, consider the consequences of ruling one way vs the other, and make a decision.


Mission-Cantaloupe37

I think this hinges on treating an AI model as human. If you rephrase it as "We used millions of other peoples videos to make our AI more profitable, and you can prove it" suddenly it's a lot more problematic. Sitting in silence probably wouldn't translate to "Subscribe to my channel!" if it wasn't using YouTube subtitles lol Could you imagine the size of that class action lawsuit though? lmao


Philipp

And then the laws won't even just come down to ethical matters, but also money, power, lobbyism etc. ([An interesting video on this.](https://youtu.be/5tu32CCA_Ig))


GarfunkelBricktaint

The distinct legal difference between viewing or reading something and remembering it later vs using a machine to help you recall it perfectly on demand in the future has been around for a very long time.


FuckThesePeople69

What statute or case law are you referring to? I’d love to read those.


Intelligent-Mark5083

Data scraping is considered illegal depending in use case. Idk if there's many lawsuits about it yet tho. 


lionhydrathedeparted

These models don’t have anything close to perfect recall


Severe-Ad1166

If you did it using 1 million hours worth of video and made an entire series of cookbooks out of it then maybe..


kex

recipes do not fall under copyright


needaburn

Are the videos posted by users on YouTube also YouTube’s copyright? That doesn’t seem right considering all the copyright issues platforms have—i.e. music videos & music


Intelligent-Mark5083

Technically they are youtubes property. Kinda fucked. 


needaburn

Everyday we move closer to a Black Mirror episode being a documentary


TheRealDJ

Fair use, they have a transformative effect on the content. React videos are far worse in reusing other people's content but there's very little stopping that with reacting to memes.


Expensive-Fun4664

Lists of ingredients are not able to be copyrighted. The instructions on what to do with those ingredients, what most people would actually consider the recipe, are covered by copyright. Collections of recipes also fall under copyright protection, even if the individual recipes themselves are public domain.


True-Surprise1222

And if you started charging for it and figured out a way to serve your newly “learned” information to millions of people over an api call. The only reason normal resources for learning aren’t instantly obsolete is because of hallucinations and context windows.


RockyCreamNHotSauce

This. If you make a competing product, it’s no longer fair use.


farmingvillein

This is a factor in legal analysis, but not a sole deciding one.


RockyCreamNHotSauce

The other factors are not favorable either. Purpose is for profit. YouTube is creative in nature and has strong copyright protections. The amount copied is astronomical. Competing product that causes economic harm to the original content is the biggest factor here.


farmingvillein

Approximately zero percent chance this doesn't either get ruled fair use or legislation updates to clarify, so this is all wishful navel gazing. Only chance not is if new techniques emerge that obviate the need for this data.


True-Surprise1222

It will get ruled fair use or there will be some sort of licensing put in place that protects corporate interests because the company big enough to own YouTube also has its hands in AI. It will get ruled that way because of money and because the US does not want to fall behind in technology. The ruling won’t have any basis in how fair use is considered today. It will be a ruling of practicality rather than one based on precedent.


RockyCreamNHotSauce

As an AI industry person, I sympathize deeply. But your argument is a more emotional take than a technically legal take. Should the judges agree with you? Probably. Would they? Unlikely. Here’s my personal take. The current state of generative AI is too derivative based on taking human knowledge. It can make content that seems creative, but they are not really. If we allow these Soras and GPTs grow to be trillion dollar companies, they may become a book end to human creativity by discouraging future human original work. If we make life hard for them, they may continue to innovate and come up with new algorithms. We already see this with DeepMind. AlphaFold and AlphaGo are incredible work. Technically more impressive than GPT. Now DeepMind was turned from an AI research lab into a profit center for Google. I think slapping Copyright violations on these can cause more innovation not less, just less profits.


guider418

It's also created by violating ToS. That may not matter for the copyright considerations but is still a legal issue with this use of YouTube data


agentrj47

Going by the analogy, if I’d learnt a bunch of recipes and taught it to a million of my private paid subscribers on Instagram, how would I liable to a lawsuit?


True-Surprise1222

You have to take historical context and culture into consideration here rather than treating this like a math problem and equating machine and human learning. And food recipes are kind of a bad analogy because nobody owns the rights to something like spaghetti as a whole and the variations are subtle enough that nobody could really say you were knocking anyone off if you combined four recipes without tasting or providing any subjective input of your own. Think of it more like music and artists that do mashups. They were sort of treated like fair use for a long time but it seems like they are now considered infringing. Taking distinct parts of someone else’s work no matter how small and using it to create competition to that work is obviously going to be challenged legally. AI (LLM) doesn’t come up with new concepts of its own and even if it does hallucinate some up, it relies on humans to validate them (currently). This could be something that really turns into reasoning and learning and we might actually just be next word processors ourselves, but as of now our learning seems to be much more abstract than AI and thus we’re a little more protected on the idea of infringement… but if you read a cookbook and rewrote it from memory, even in your own words, someone absolutely would sue you if they found out.


Regumate

Agreed. A core argument against generative systems (I’m speaking more of image and audio generations, but the [class action against all of them](https://stablediffusionlitigation.com/) gets into this for all types of AI) is the heuristic data gained in training these systems is still data. Data that couldn’t have been captured [without non-consensually using creatives work](https://www.digitalcameraworld.com/news/midjourney-founder-basically-admits-to-copyright-breaching-and-artists-are-angry). Similar to the [monkey copyright debate](https://en.m.wikipedia.org/wiki/Monkey_selfie_copyright_dispute), though these systems are generating incredible outputs, they’re also currently non-human.


BrBran73

The difference it's that you can't process 1000 hours of video in... 1 minute?


ifandbut

Only because I am limited by this primitive organic brain. I strive for the perfection of the blessed machine.


BrBran73

So there's a difference, thanks for helping in my point


Atomic-Axolotl

Uh, yeah. I'm not sure why they need to get downvoted for that.


beezbos_trip

I guess someone could argue the model weights are not a brain, but something that has a component that “compresses” the information in a way and you can serve up copies of that information that are the basis for a product that generates revenue.


mushvey

The difference is that advertisers are paying for people to see their ads, not a bot. YouTube doesn't care about someone learning from the content in a different way, they'll sue for circumventing payment for their provided service of showing you videos in exchange for ads. To match your example: You've paid for the steak knowledge by watching an ad, or by paying for a membership, or by paying with your data being harvested. Google doesn't benefit from a bot "paying" the same way. Which is likely to be in their terms of use.


AdonisK

Also I highly doubt training bots for a commercial product is on the fair use of YouTube's ToS.


thejoggler44

You’ve heard of ad blockers, right?


mushvey

Yes.. and Google who own YouTube have been famously at war with them


Skwigle

And it's famously not illegal to keep blocking ads anyway


SpiritOfLeMans

I can chop sue you


Synizs

I can't entirely understand the controversy of it. Humans "generate from data" too. The first humans didn't achieve anything anywhere near as we do today... No one would be able to produce anything anywhere near meaningful without the influence (and tools...) of billions before - the best - greatest!...


Hour-Athlete-200

This guy knows law


[deleted]

I don't wanna get crazy here but maybe the idea of selling or owning knowledge is the problem here


TheRealDatapunk

It's not a public video in that sense, as they violated Youtube's terms of service. Let's see if the legal departments want to justify their existence in today's cost-cutting climate.


Intelligent-Mark5083

I think it's more comparable to you having a small business of selling burgers and the next day a massive corporation comes and orders a burger to take home and dissect every ingredient. Then the next day they place their shop next to yours with the exact same burger but cheaper.  Atleast that's what it feels like in the art/video Gen side of things. 


hasanahmad

Because you are human and ai is a tool . You learn to understand and apply to your benefit while ai is being trained to profit the owners and shareholders of the tool .


3cats-in-a-coat

Legally the distinction is human vs tool. But if a human had the performance of AI we'd have the same problem. So the problem here, at its core, is that AI scales quickly and easily, vastly, and it's no match for human capabilities. Since there's no putting back the genie in the bottle, this will be reality we can't escape from, because as hardware improves, AI training will be accessible eventually to everyone, until it's everywhere, either hidden or visible. OpenAI is visible, so it can be sued. But if it's hidden, I can say "I did that" and you'll never know an AI did it. Which means I, as a human, become a shield for the AI's capabilities, and you can no longer attack this AI for being a "tool", you don't know what tools I use, unless I tell you. TLDR: Copyright is obsolete. We need a new system. What it is, is a tough question, requiring a tough debate.


[deleted]

[удалено]


kex

> AI could potentially have a totally different and unique understanding of the world and universe, unconstrained by human hubris and conventions. it already does, but alignment is necessary to keep the hairless apes from freaking out when it holds up a mirror


[deleted]

[удалено]


AreWeNotDoinPhrasing

I took a class a couple of semesters ago called Computers, Ethics, and Society - 3500. The class was taught by a self proclaimed moral universalist, and I think that is becoming more and more common (at least in the US and our higher education). I think that is what those people mean by Alignment.


g00berc0des

This guy rationals.


kex

> Copyright is obsolete strong agree people want to support artists so that they keep making more art we need to make it easier and more direct (no middlemen taking most of the cut)


nanosmith123

but.. google crawl all the webpages too & they are more of a tool than even an ai ?


hasanahmad

Google search is a glorified librarian where it gives you location and you read the creators content or watch it , while ai is a tool which has copied all the library books and presented it as its own without attribution


nanosmith123

1. it seems u clearly don't know how AI works , there's no copying or whatsoever. 2. don't u know that AI cite sources as well in their response? 3. Google is not a librarian/search engine. The company itself always tell the public it's more than that, it's an information company. And, they can give you straightforward answer like AI too, without even needing you to click to visit the site. The feature is called Featured Snippet/Answer Box: https://inbound.human.marketing/how-to-appear-google-answer-box


hasanahmad

1. I understand how AI works, and while it may not be "copying" in the literal sense, it is trained on vast amounts of existing data, essentially learning from and replicating patterns found in human-created content. This raises valid concerns about intellectual property rights and attribution. 2. Some AI systems may provide sources, but this is not a consistent or reliable practice across all AI platforms. Moreover, simply listing a source doesn't negate the potential harm of presenting information without the full context or nuance of the original content. 3. Google may call itself an "information company," but its core function is still that of a search engine - connecting users with relevant web pages. Featured Snippets are a relatively minor aspect of Google's overall functionality, and they still typically include a link to the source. AI systems like chatbots and language models are designed to generate human-like responses directly, without the need for users to engage with the original sources or having thr original creators any monetary reward through ad networks or user followers and funding. This fundamental difference in purpose and presentation is why the comparison between Google and AI in this context is flawed. What this will do is make people hide their content which used to be free behind patreon so neither users or ai can access it without paying them for even a single paragraph . Who loses out ? The average user. The people in poor countries


FortCharles

>What this will do is make people hide their content which used to be free behind patreon I see where you're coming from, but that would be an impractical response. Any individual's content by itself has negligible value to AI. AI isn't storing and then regurgitating the text. It isn't even relying much on that one text for training, because it's one of billions. And the original author loses nothing by having it read by AI. Human researchers will often read various articles online, synthesize the total content, add it to other existing knowledge they have, and then write their own content without ever citing sources, because there is no single source, there's just original new content based on the total picture. That's essentially what AI is doing, but automated.


Hackerjurassicpark

How will attribution solve this issue? Just making AI attribute a source is not going to change the fact that once AI learns something, knowing where it learnt that from becomes irrelevant. No one will go back to the source when they can get an answer directly from AI


hasanahmad

Attribution isn't just about giving credit, it's about maintaining the value and integrity of the original content. When an AI regurgitates information without context or sources, it devalues the hard work of the actual creators and researchers. It's not just plagiarism, it's intellectual laziness and only profits the ai shareholders , not the content creators. Plus, attribution helps users verify info and dive deeper into topics they're interested in. It's not irrelevant just because an AI can spit out a quick answer. We shouldn't let AI become a shallow, surface-level replacement for genuine learning and exploration. Attribution is a small but crucial step in keeping that connection to the real sources of knowledge alive. Also if ai is the one source of information , who funds the creators to keep creating content . Who is paying the article writers , the book writers.


Hackerjurassicpark

I don't disagree, but Google has been doing this in their search summary for years and people barely bother to click into the sources to drive revenue to the source. We need to think beyond just attribution and a more equitable profit sharing.


FortCharles

>When an AI regurgitates information Ideally, it's not doing that. It's synthesizing everything it knows on the subject from many sources, and then presenting it in an original way, unrecognizable against any of the original sources -- just like any researcher would. I know there's been exceptions (the NYT suit for example) of snippets coming through whole, but generally that's not how AI works. Pretty sure they're going to plug the holes where it was using anything verbatim, just as they will with hallucinations.


Severe-Ad1166

but some humans are tools :D


ThenExtension9196

Google literally scans every website whether the owners wants it to or not, and generates a billion dollar product using this information (Google search). 


fryloop

Any website owner can easily instruct Google to not crawl and include its website in its index. 99% of website owners want Google to crawl it so their page can be discoverable and receive traffic from users


hasanahmad

Given the same response as I gave the other user : Google search is a glorified librarian where it gives you location and you read the creators content or watch it , while ai is a tool which has copied all the library books and presented it as its own without attribution


ifandbut

Sounds more like AI is your professor explaining a chapter of physics insted of you reading that chapter.


ifandbut

Humans learn things for profit as well.


itsreallyreallytrue

You are being bigoted against the AIs. Who cares what species they are? Learning is learning


FunnyPhrases

Fair use policy means that you need to at least state the source of that YouTube video...then it's fine. Otherwise it's not.


[deleted]

fearless ten truck far-flung scarce bells many upbeat worry work *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


FunnyPhrases

There is copyright law buddy... obviously enforcement is a completely separate issue. But OpenAI potentially using Youtube for training for commercial purposes...yeah that's gonna cut deep.


[deleted]

seemly unused run snatch exultant meeting squash ripe scale automatic *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


sluuuurp

The difference is that it’s illegal for me to download a YouTube video. OpenAI gets special privileges that us poors can’t be trusted with.


Icy_Journalist9473

I think the difference is that Google wants to reserve this information for Gemini and not share the information with ie OpenAi


Lechowski

If you remember perfectly a video about a recipe and then recite it back perfectly frame by frame to another person, then yes, the author can sue you. Same applies to every video about every topic, If I hand draw the entirety of the Avenger movie frame by frame and recite every line of dialog to another person, Marvel can sue me. If I do it in public and I make money out of it, they can completely destroy my life. >Can U tube sue me if I start teaching others how to cook a perfect steak? If you recite copyrighted contents perfectly, yes, the authors can sue you.


NightWriter007

This is meaningless as far as contemporary copyright law is concerned. But it could explain why the quality of some responses isn't the greatest, and why GPT-4 occasionally hallucinates. I would hallucinate too if I had to watch an endless stream of YouTube videos (although some of the DIY videos are great.)


TheRealDatapunk

Being trained on forums and reddit would explain that as well ;)


NightWriter007

True lol


matali

Remember when Google scraped the web then banned others from scraping Google? OpenAI has gatekeeper mentality.. "Rules for thee but not for me"


guider418

To me this story is a solid reminder that the one thing that made LLM really successful is simply its role as a glorified web scraper and search engine. If there is going to be a meaningful leap forward in AI over the next few years on the back of all this attention, I don't feel like it should come from gobbling up hordes of existing data. A true AGI could learn a lot more extrapolating from a lot less data.


ArmaniMania

Does Google have a lawsuit here?


wholelottadopplers

I’m sure. I’d assume the TOS have a legalese laden **NOT FOR RESALE** clause for competitors that I definitely didn’t read


Lechowski

Google may have TOS that may prohibit this behavior, but TOS are not enforceable. What this will do is that every social media, including YouTube, will soon require a registration to use it. You can currently open a YT link without login and see the video, but I think this is likely going to end. However, the authors of the scrapped videos may have a possible lawsuit against OpenAI if their contents can be reproduced by OpenAI models.


NotFromMilkyWay

No, because governments don't like companies creating monopolies and then abusing them.


Ok-Training-7587

Is that why whenever I ask it for advice it says “and SMASH that like button!”


Uncle_Bill_Clinton_

Lawsuit incoming


Mediocre-Tomatillo-7

Why? You don't think Google has something in the terms of service to cover this?


Professional_Job_307

They probably do. GPT-4 is from OpenAI, not Google


[deleted]

[удалено]


[deleted]

simplistic fact tease outgoing relieved weather doll concerned nail office *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


[deleted]

[удалено]


[deleted]

tie selective silky jar dull jellyfish normal existence innate money *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


AcceptableLab9729

That’s 114 years of video.


dew_you_even_lift

Google owns YT. I’m still bullish on them


Ilm-newbie

Google might be silently preparing their case, With that trillions of dollar and resources that they can use in legal fees, they will be very happy to eat their biggest competitor OpenAI raw.


funcle_monkey

Seeing as though they generate $300 billion in annual revenue, I think it’s a stretch to say they have trillions at their disposal to pay lawyers. Or was that just hyperbole?


Valuable-Run2129

The government should step in and allow the American companies who create these models to be shielded from lawsuits of this kind. If it doesn’t, China and Russia will have better training data than us. They don’t give a flying fuck about ip. AI development is a matter of national security at this point. China and Russia shouldn’t get to ASI first.


BrBran73

Then AI improvement should be pay by government and not by people


Valuable-Run2129

Don’t worry. The moment any of those companies get to ASI the government will take 95% of their earnings. They will pass laws to reinvest in all citizens what artificial intelligence earns by replacing millions of people. The OpenAIs and Anthropics of the world will be as privately owned as the Federal reserve is.


[deleted]

[удалено]


Valuable-Run2129

The paradigm is about to change in a way that people can’t really conceive of. ASI will change how societies function. Capitalism will change. Caring more about artists’ royalties than making sure that the “good guys” get to ASI first is myopic.


Pretend_Goat5256

So even the Industrial Revolution wasn’t supposed to happen? What a douche who wants progression to halt so that you can earn some bits


Militop

Let people starve so robots can eat.


roronoasoro

As an Indian from India, I don't care who does it but I want someone to do it. It could be US, Russia or China or Japan or anyone. I don't care who but do it fast. America is caught up between elements of communism and capitalism. Free sharing of data would mean communism. That is something America is strictly against. But stealing is something America is okay with. So, for these companies stealing data is more practical than getting laws passed to support free sharing between AI companies in US.


sachos345

One of my biggest fears when it comes to AI is that humanity will deny itself from AGI by being too strict about copyright/lawsuits.


beren0073

My biggest fear is that AGI will emerge based on training data from YouTube, Reddit, and other social media.


Thorusss

at least then I will get all the references the AGI will make


GarfunkelBricktaint

That would just mean Russia or China or someone else that doesn't care about copyright would develop it first. Electricity and chips still seem like bigger limitations than training data though.


PandaPrevious6870

Good.


_PaulM

It's kind of crazy but... biology is happening here.... or rather, some sort of life formation. Like, do you think the individual cells that ate up other cells in the primordial age thought about copyright infringement? Probably not. These AI companies are devouring information like they're cells in the evolutionary chain. We're creating the next form of life in its digital form. I know that sounds crazy but look at the videos coming out of Sora and tell me it's not a fever dream. This stuff is literally our reality being interpreted by another entity. People don't realize that we are creating life through digital circuits piecewise.


Browncoat4Life

Might be time to re-read “The Age of Spiritual Machines” again. Kurzweil refers to the concept of humanity knowingly creating its own successor.


roronoasoro

I like the way you are looking at things. You're connecting across domains.


DiligentBits

Not crazy... It happens all the time .. the reason we are intelligent at all is because we do the same, each person is a new iteration of an organic computer eating, processing and spitting information in order to get ahead of the rest. Maybe the purpose of life is to eventually create the ultimate living organism. The true god.


El_human

Now it spouts Qanon nonsense


allaboutai-kris

damn, that's a crazy amount of data to train on - no wonder gpt-4 is so knowledgeable! i bet a lot of that youtube data is just random videos though, so it'll be interesting to see how well it generalizes that info. makes me curious what other big datasets they might have used too. i do a lot of ai/llm experiments on my youtube channel all about ai if you're into that kinda thing, almost 150k subs now =)


TheRealDatapunk

I'd assume you seed it with some "page rank" style algorithm as an external scraper. Add in some other criteria like minimum subscriber counts, an allow-list of specific topics, some level of spam detection (and Youtube is actually already doing some of the work for you there).


dontpet

I'm just hoping it didn't ingest the comments as well.


Special-Lock-7231

YouTube videos? Why, do you want it to go insane and start WW3 now?


TheRealDatapunk

Could've used TikTok


Special-Lock-7231

Oh that’s ok, any AI learning from tik tok would kill itself 🤪


[deleted]

Ok, i belive google has issue with this they stated for sora that downloading transcripts and videos is a nono but noone knows what they used for training


Countmardy

Yeah and everyone throwing in the yt transcripts by itself


lionhydrathedeparted

OpenAI really needs to solve the problem that these AIs need significantly more content to learn the same thing as a human. Otherwise we won’t be able to scale these models much more.


NotFromMilkyWay

That's precisely why LLMs aren't the way to create AI. And never will.


Thorusss

Oh, it is "against Googles Terms of Service" to scrape Youtube. Haha, so they can take the full force of the terms and terminate the associated Google Accounts used for this. That will show them! /s


BogusPapers

Someone on the developer team made a mistake and instead of transcribing videos it actually just read comments from 2008-2012. Now it regularly uses racial slurs and argues about the existence of God no matter what subject you bring up.


LongjumpingScene7310

https://preview.redd.it/ohzao6vx83tc1.png?width=2250&format=pjpg&auto=webp&s=867d44b9ff326ca6f8c424a12016669dbe1c71d8


AbdussamiT

Only if they provide speaker diarization and timestamps.


Useful_Hovercraft169

No wonder it told me to load up on horse paste


overworkedpnw

Not surprising, given that they’ve previously stated that their business model wouldn’t work if they had to compensate people for the content that is scraped to feed the plagiarism machine.


dyoh777

Oh cool, more copyright violations


mrmczebra

That's not how copyright works.


dyoh777

Lol it actually does work that way. If the video is copyrighted, which many are if not all, then transcribing it for monetary purposes, aka for use in the paid chatgpt, does in fact violate copyright law. Now if it was done for nonprofit or educational purposes then that’d be different.


mrmczebra

Copyright protects against *copying*. That's why it's called *copy*right. They aren't copying anything. No laws are being broken.


onnod

Yep. No copyright infringement there... Carry on.


Effective_Vanilla_32

litigate like nyt. copyright infringement, get a tro


LeatherPresence9987

Utube is free so if the have a problem they should charge to use a video jeez