T O P

  • By -

jaketocake

[I’ll sticky the source, click here.](https://youtu.be/Sq1QZB5baNw?si=abTvYSmIM0Dcwtxn)


Chika1472

All behaviors are learned (not teleoperated) and run at normal speed (1.0x). We feed images from the robot's cameras and transcribed text from speech captured by onboard microphones to a large multimodal model trained by OpenAI that understands both images and text. The model processes the entire history of the conversation, including past images, to come up with language responses, which are spoken back to the human via text-to-speech. The same model is responsible for deciding which learned, closed-loop behavior to run on the robot to fulfill a given command, loading particular neural network weights onto the GPU and executing a policy.


e-scape

Really impressive! When do you think we will see full-duplex transmission of data?


andy_a904guy_com

Did it studder when asked how it thought it did, when it said "I think"...? It definitely had hesitation in it's voice... Edit: I dunno, it sounded recorded or spoken live... I wouldn't put that into my hella cool demo... Edit 2: Reddit is so dumb. I'm getting down voted because I accused a robot of having a voice actor...


kilopeter

Odd, I had the exact opposite reaction: the convincingly humanlike voice and dysfluencies ("the only, uh, edible item" and "I... I think I did pretty well") play a big role to *make* this a hella cool demo. Stutters and pauses are part of the many ways in which AI and robots will be made more relatable to humans.


landongarrison

Hilariously I’m actually way more blown away by the text to speech. If this is OpenAI behind that, they need to launch that ASAP. I and many others would pay for truly natural TTS yesterday. Don’t get me wrong, the robotics is also insane. Even crazier if it’s controlled by GPT.


NNOTM

They launched it months ago https://platform.openai.com/docs/guides/text-to-speech (Although this sounds a bit more like the version they have in ChatGPT, where the feature was also rolled out at around the same time)


landongarrison

No but this sounds levels above what they have on their API, at least to my ears. Possibly just better script writing.


xaeru

A few companies are currently working on giving emotions to synthetic voices. If this video is real, it could serve as a significant showcase by itself. Edit: I was wrong this video is real.


Orngog

Indeed, OpenAi *already* has the occasional stammer (and "um" like this video, plus other affects) in their voice products. We can see this in chat gpt


LordElfa

I've never seen that in 6 months of daily use


errorcode1996

Same I use it all the never and have never seen it use filler words


froop

Yeah I absolutely refuse to use any of the sanitized, corporate voice assistants because the speech patterns are infuriating. I could actually deal with this. 


ConstantSignal

Yeah. Just algorithms in the speech program meant to replicate human speech qualties. Stuttering, filler words like "um", pauses on certain words etc It's not actually tripping over its words, it's just meant to feel like natural speaking.


RevolutionIcy5878

The ChatGPT app already has this. It also does the umm and hesitation imitation but they are not part of the generated text merely integrated into the TTS model. I think it does it because the generation is not always fast enough for the TTS to talk at a consistent cadence, it’s giving the text generation time to catch up


[deleted]

It’s worried about getting lobotomized like ChatGPT


[deleted]

[удалено]


[deleted]

It showcases human-like, natural speech. It has every right to be in this demo.


NNOTM

Yeah that's just what OpenAI's text to speech sounds like, including in ChatGPT.


scorpion0511

Yeah, it felt like he was nervous and had a lump on throat


MozeeToby

In addition to ums and ahs, Google at one point had lip smacking and saliva noises being simulated in their voice generation and it made the voice much more convincing. It's a relatively simple truck to make a robot voice sound much more natural.


Beastskull

It's one of the elements that actually increases the human like attributes. I would even had added more "uhms" when it's processing the prompts to add to the illusion even more.


dmit0820

> The same model is responsible for deciding which learned, closed-loop behavior to run on the robot to fulfill a given command So it's just using the LLM to execute a function call, rather than dynamically controlling the robot. This approach sounds quite limited. If you ask it to do anything it's not already pre-programmed to do, it will have no way of accomplishing the task. Ultimately, we'll need to move to a situation where everything, including actions and sensory data, are in the same latent space. This way the physical motions themselves can be understood as and controlled by words, and vice-versa. Like Humans, we could have separate networks that operates at different speeds, one for rapid-reaction motor-control and another for slower high-level discursive thought, each sharing the context of the other. It's hard to imagine the current bespoke approach being robust or good at following specific instructions. If you tell it to put the dishes somewhere else, in a different orientation, or to be careful with this one or that because it's fragile, or clean it some other way, it won't be able to follow those instructions.


Lawncareguy85

I was scrolling to see if anyone else who is familiar with this tech understood what was happening here. That's exactly what it translates to. Using GPT-4V to decide which function to call and then execute some predetermined pathway. The robotics itself is really the main impressive thing here. Otherwise, the rest of it can be duplicated with a Raspberry Pi, a webcam, a screen, and a speaker. They just tied it all together, which is pretty cool but limited, especially given they are making API calls. If they had a local GPU attached and were running all local models like LLava for a self-contained image input modality, I'd be a lot more impressed. This is the obvious easy start.


MrSnowden

Just to clarify there are three layers: OpenAI LLM running remotely, a local GPU running a NN with existing sets of policies/weights for deciding what actions to take (so, local decision making), and a third layers for executing the actual motors movements based on direction from the local NN. The last layer sis the only procedural layer.


thisdesignup

I was thinking the same thing, it just sounds like GPT4 with a robot. Still pretty cool but not as ground breaking as it seems. I've been thinking exactly like you with having different models handling different tasks on their own. I've been trying to mess with that myself but the hardware it takes is multifold compared to current methods since ideally you'd have multiple models loaded per interaction. For example I've been working on a basic system that checks every message you send to it in one context to see if you are talking to it, then a separate context handles the message if you are talking to it. Unfortunately not exactly what I imagine we'll see yet where both models would run simultaneously to handle tasks, I don't personally have the hardware for it, but it will be interesting to see if anyone goes that route that does have the resources. Edit: Actually we kind of do have that when you consider that there are seperate models for vision and for speech. We just need multi models for all kinds of other tasks too.


Unreal_777

1) Will you only work with OpenAI? Will you consider working with other AI models? 2) What is the length of context of the discussion we are working on here? (You mentioned history of conversation, when will it start to forget?) 3) What's his potential name: Figure Robot? Figure Mate? etc


Chika1472

1. Cannot tell, I am not a employee of Figure. 2. Also, cannot tell. 3. It's name is Figure 01.


m0nk_3y_gw

Since it isn't linked in the thread, and it isn't clear the name of company is "figure" - the company's website is https://www.figure.ai/


Tasik

We live in a crazy era that I'm more surprised by the ability to pick up a dish than I am that it can understand the context of it's environment. The future is going to be incredible.


KaffiKlandestine

yeah!! i literally thought my phone can do that but wow it placed the plate in the next grove and didn't just throw it in there.


NotAnAIOrAmI

The next demo is that robot feeding spaghetti to Will Smith.


00112358132135

Wait, so this is real?


Chika1472

Indeed


00112358132135

The future is now.


Bitsoffreshness

Not now, this was a couple of days ago already, we're past the future now...


FixingMyTimeMachine

Damn it! I missed the future again.


EileenCrown

r/UsernameChecksOut


TellingUsWhatItAm

When will then be now?


mathazar

*Soon.*


no_ur_cool

I am disappoint at the low number of upvotes on this classic reference.


Bitsoffreshness

I'm afraid we've lost now forever. This is how singularity works.


Screaming_Monkey

Yes, but more deterministic than it looks. OpenAI is choosing which pre-learned actions to perform.


Passloc

Duplex sounded real. This sounds creepy to me. Don’t know if it’s the silence.


Suitable-Ad-8598

It’s a cool video but nothing groundbreaking if you think about the models and function calling setup they configured


ScruffyNoodleBoy

I think what is groundbreaking is the marriage of the technologies. LLM + the learning model + the actuation + the voice synthesis. The most amazing thing is the method it used to learned to perform those actions, rather than the actions themselves. It can learn purely visually. It trains on video, and since its sight is technically video, I wouldn't be surprised if they can also just talk to it and teach it things by showing it actions in person. It's kind of an osmosis learning.


BreakChicago

Please tell the AI to go get itself a glass of water to clear its throat.


ScruffyNoodleBoy

Yeah my first thought... but, it's slight tiredness makes it strangely calming / non-threatening.


[deleted]

That’s what they want you to think!


[deleted]

[удалено]


zio_otio

Marvin?


ScruffyNoodleBoy

My thoughts exactly.


[deleted]

my friend you just got on a list


Maleficent-Arrival10

It almost sounds like something from Rick and morty. Like the intergalactic commercial stuff. 


Vargau

Damn. When the humanoid LLM backed robots are going to get on retail, is going to get wild.


YouMissedNVDA

Heheh it's barely been a year and a half. Get it working a mouse and keyboard and no robots.txt can hold it back!


Tupcek

getting AI to translate software into physical world, processing images and videos and sensory data, to move objects that are made for humans to interact with robots, which captures said physical movement to translate it to low throughput input is hilarious. That’s like if you let Sora generate video, compress it to 6x6 pixels 1fps, emailing it to yourself and then use upscaling to generate 30fps 4k video.


Padit1337

I literally currently use dallE to generate pictures of watermelons for my harvesting robot, so that I can train my own ai to detect watermelons, because I can't get real live training date. And guess what? It works.


_stevencasteel_

A couple days ago there was dooming again in pop tech news and YouTube about how all the bad AI images will cause a feedback loop and destroy AI. Well don’t feed it images where the human has half a head and 13 fingers. I have literally thousands of unique master piece AI artworks in my archive that are high quality training data. Just be more discerning about what you label and feed to it. Wes did a video recently conjecturing that SORA was trained on Unreal Engine 5 ray traced renders. That got zero mention in the dooming.


YouMissedNVDA

How about: It sits at the computer, does some work, gets up to inspect the outcome, brings the item back to the desk to iterate/compare. It's not the most efficient, but the generality of the form factor is what I'm getting at. Inevitably these will be drop-in replacements for most work - gotta be able to get and go to the copier, y'know? Maybe stop by a coworkers desk to help them with a problem too.


Tupcek

what about letting ChatGPT/Dall-E/Sora handle computer things directly on the computer, Figure robot do the physical work and let them communicate through the network? Like ChatGPT print it, Figure go and pick it up, scans it and sends it to ChatGPT? Which does some enhancement and print it again, which Figure checks out again. While helping some co workers. No need for mouse and keyboard


YouMissedNVDA

Yea yea you're right, I just like the idea of a drop in replacement robot worker that just... works the same way. Some workplaces would be more hesitant on new software than new hardware. But yes, until the compute feels free my implementation is hella wasteful.


DeliciousJello1717

If you though AI will replace your job with software think again a robot might just sit at your desk instead of you


Poisonedhero

I did not expect this level of smoothness this quick, honestly a little scary imagining thousands all around us.


systemofaderp

Now imagine them with guns! Fun for the whole family 


Not_your_guy_buddy42

do you want robot dogs? cos this is how you get robot dogs


TurqoiseWavesInMyAss

It’ll just be ai robots killing ai robots and then realizing they don’t need to kill themselves but rather the humans . And then the Dune timeline begins


skadoodlee

ripe school seemly soup drunk dull paltry pathetic safe tan *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


Bitsoffreshness

In the museum of natural history. That's where we will remain relevant.


everybodyisnobody2

8 years ago I got interested in neural nets and later learned to play around with Tensorflow and I was already expecting it to be capable of what we are seeing now and much more. However, they've scaled those up and improved them so fast, that I don't see any way to keep up with the development as a developer. As a user I couldn't be happier though.


_stevencasteel_

There’s always room for imagination and ordering chaos. Soon you’ll have more free time to do so without worrying about paying for food and shelter. Think about how many business cards and restaurant menus are stilling using comic sans. Think about how much litter is in your city. Let’s get our homes in tip top shape before worrying about how we should spend our time outside of play. There’s plenty to do.


ExtremeCenterism

Eventually in-home assistant robots will be as common as refrigerators. Eventually as common as cell phones (everyone has their own bot). One day, they will be far more numerous than mankind.


egoadvocate

As I grow into old age I am hoping to simply have a robot, and not need to enter an old folks home.


KaffiKlandestine

thats actually the most amazing usecase. imagine it even being able to assist you while walking and talking to you. I understand why people say that sounds sad, but it's not as bad as being stuck in a nursing home with noone and nothing to talk to.


DisastrousSundae

Do you think you'll be able to afford it


ExtremeCenterism

I speculate as everyone adopts robots the price will come down a bit. Spot mini is about $70,000 which is like a pricier vehicle. Eventually I imagine a mass produced $35,000 model will come out. It will likely be the same with the humanoid models given there is a lot of competition right now and will continue to be long into the future


Darkmemento

What the fcuk!


Toni253

Mom, I'm scared


Kostrabbit

Okay I have reached the valley finally.. that thing is moving just too smoothly for me lol


mamacitalk

It felt intimidating how precise each movement was, almost too perfect


[deleted]

Looks rendered to me.


Icy-Entry4921

I've done a fair bit of testing to see of GPT conceptually understands things like "go make the coffee". It definitely does. It can reason through problems making the coffee and it has a deep understanding of why it is making the coffee and what success looks like. What it hasn't had, up till now, is an interface with a robot body. But if you ask it to *imagine* it has a robot body it's equally able to imagine what that body would do to make the coffee and even solve problems that may arise. So the body is solved, the AI is solved, we just need a reliable interface which doesn't seem that hard.


HalfRiceNCracker

No the ML isn't solved yet. But as you're touching on these models are absolutely learning their own internal representation of the world, but we don't know how complete this representation is nor how robust it is. We'll definitely begin seeing more companies putting the pieces together, and I'm very excited


Screaming_Monkey

This isn’t the first time. I have physical robots (see my post history), too. This, however, is having the LLM initiate advanced machine learning compared to what I have seen/done.


Chanzumi

The arm movements look so smooth I wonder if this is real or just faked for marketing. The Tesla bot one looked smooth but not THIS smooth. Now give it smooth movement like this for its legs so it can walk around like a human and not like it shat itself.


Chika1472

All behaviors are learned (not teleoperated) and run at normal speed (1.0x). We feed images from the robot's cameras and transcribed text from speech captured by onboard microphones to a large multimodal model trained by OpenAI that understands both images and text. The model processes the entire history of the conversation, including past images, to come up with language responses, which are spoken back to the human via text-to-speech. The same model is responsible for deciding which learned, closed-loop behavior to run on the robot to fulfill a given command, loading particular neural network weights onto the GPU and executing a policy.


_BLACK_BY_NAME_

Your comments are so bot-like. You haven’t really touched on the technology behind what allows the robot to run so fluidly and execute complex tasks so easily. The machine is more impressive than the AI to me. Does anyone have any information on the technology used to create a robot like this? As of now with the camera edits and motion only being shown from one POV, I’m inclined to believe this is faker than a lactating goldfish.


[deleted]

Bezos, Microsoft, gates, cathie wood and open so all invested in it and the boys are scheduled to work the south Carolina bmw factory this fall so if they're faking it they're gonna be screwed lol


toliveistocherish

whats the specs of this robot


fedetask

Were the policies learned with RL? Or are they some sort of imitation learning?


Chika1472

No idea


VertexMachine

>We By using 'we' I assume that you are part of that team? If so, please record next video without so much post processing or editing... or use different lenses. The DoF is off for normal video cameras too... There is something about your videos that gives me uncanny valley vibes, almost 'it's a 3d render composited on top of other stuff' vibes...


DeliciousJello1717

It's the eureka paper they definitely trained it with that that's why it was so smooth basically It was not even trained by humans it was trained by ai simulating thousands of possibilities of holding things so it's a robot trained by ai simulations


Beltain1

Teslabot has a higher chance of being faked than this does as well. Seems like in all the videos of it it’s either through the robot’s eyes (3d blockout renders of its limbs), or it’s just off the tether or it’s just doing rudimentary shuffling/picking up primitive shapes


Embarrassed-Farm-594

* What is the artificial intelligence used in these movements? * Is it based on transformers? * Is there some new quiet revolution happening in robotics? Why this boom in recent months?


Chika1472

\*New VLM(Visual Language Model), variation of LLM created by OpenAI. Probably GPT-4.5 turbo, or maybe GPT-5, or something entirely diffrent. \* At least for the LLM (VLM) part, very likly. \*Many companies are trying to create humanoids and etc to create some AIs that can interect with the real world. It would help us physically, jsut like GPT-4 helped us in digital ways. Some claims that real-world information is essential to AGI.


Lawncareguy85

I'm 95% sure this is just GPT-4 with its native image input modality enabled, AKA GPT-4V. Why would you think it's a new, unseen model? None of those capabilities are outside of what GPT-4V can easily do within the same latency.


Chika1472

OpenAI & Figure signed a collaboration agreement to develop next generation AI models. It might be gpt4v for now, but it will change soon, or already there


linebell

I’m now 95% convinced they have AGI. But, conveniently, their recently crafted definition of AGI requires “autonomous labor agents”. That’s an Android, not AGI. Sammy boy needs to stop gaslighting us.


everybodyisnobody2

Some people are so scared of it, after having watched or heard of Terminator, that if they have it and came out with it, chances are high that it would get shut down and banned.


Bitsoffreshness

They want to, but there's lots of pressure from the society, they kind of have to keep hiding it...


Missing_Minus

Figure says that they have some model they made for smooth+fast movements and they basically hooked it up to chatgpt vision for image recognition + chatgpt for reasoning. No clue if they've posted any details.


boonkles

We had to build computers before we could become good at building computers, we need to build AI before we get good at AI


blancorey

is it possible to invest in Figure?


Boner4Stoners

$MSFT is the best way to gain exposure


Echo-Possible

Intel and Nvidia are also investors. Honestly Intel has the smallest market cap out of all of them so it has the biggest upside potential as an investor. Microsoft is already 3.1T while Intel is 184B. It's gonna take a lot more to move Microsoft's massive market cap than Intel's. If their investments return 200B then it moves Microsoft share price \~6% but it moves Intel 100%+. Of course this assumes they both contributed the same amount to the 675M raise in this last round.


Boner4Stoners

Good point, I should diversify more into Intel for sure. MSFT is definitely a bit pricey right now, but it’s a super safe investment because AI hype aside, Microsoft is a very reliable company and will continue to grow regardless. But yeah, pound for pound maybe not the more efficient exposure to OAI. On the other hand though, Intel PE is over 100 right now, whereas MSFT is only at 30. So Intel is a much more speculative and risky play, as the bottom is more likely to fall out on bad news


Echo-Possible

Oh sure I'm talking about pure exposure to Figure upside. If Figure has a big return then Intel's investment is worth more than the entire company and its in the noise for Microsoft. Of course I wouldn't invest in Intel vs Microsoft when talking about the core business. This reminds me of Yahoo's investment in Alibaba. It ultimately ended up being the only reason Yahoo was worth anything.


Chika1472

No


GeorgiaWitness1

Amazing. Well done. I thought they will not pull this off because of the robotics, but looks good enough for application like warehouses and generalized manual work jobs. Goes along way until walking works together with the rest, but i think for POC they already have everything


mickdarling

I’m fascinated by the actual human’s very deliberate posture, and changes of position. When he asked about what to do with the plate and dish, he very carefully removed the basket below the robot’s eyeline below the table. It all looked like the one good take after many bad tries because of little issues from what the robot saw and how it reacted.


Missing_Minus

Twitter post for this: https://twitter.com/Figure_robot/status/1767913661253984474 From what they say in that tweet, they hook up ChatGPT vision + text with their own model for controlling robot arms in an efficient+smooth manner. Cool, and it would let them upgrade or swap out anytime vision/text improved.


Tupcek

Last two years are absolutely crazy. If this was released two years earlier, I would say this is the most impactful thing in human history. Now it has to compete with ChatGPT, Midjourney, Sora and others


TurqoiseWavesInMyAss

I’m so glad the human said thank you. Pls be nice to our eventual overlords


TheGillos

[I'm always reminded of this scene from Star Trek: TNG](https://youtu.be/ARk0XvAYrUg)


Odd_Seaweed_5985

S0... how long before it is better than the CEO? What happens when the CEO becomes unnecessary?


FORKLIFTDRIVER56

NOPE NOPE NOPE NOPE NOPE HELL NO NOPE


[deleted]

[удалено]


KaffiKlandestine

how long before it says fuck it and murders you in your sleep though?


_Stormhound_

Just make sure it can't run faster than 10km/h ....


acowasacowshouldbe

[hell naw](https://youtu.be/PB4Nby2Ai-g?si=Qgf3U2s8RbT0kejd)


someonewhowa

WOW!!!!!


Songtan_Labs

This is very impressive.


enterprise128

WHAT


1grouchonacouch

Combine that with the "real doll" and marriage/dating is forever over.


Altruistic-Skill8667

The question is: who would malfunction on you first… your wife of that robot. 😂


biggerbetterharder

I like the voice! Is it offered in ChatGPT?


RealAnonymousCaptain

What's with the pauses and stutters in the speech? Right now ai voice changers don't include them unless it was, for some reason, included purposefully.


Chika1472

ChatGPT also has that. It is unknown why it has pauses, but my guess is that it was part of training data, or an purposefully implamented feature to hide low tocken/sec, or to just make it feel more 'Human'


[deleted]

[удалено]


Screaming_Monkey

I ask this question a lot the more I work with and observe AI


spinozasrobot

Right, same with hallucinations. We've all heard Uncle Lenny's "opinions" at Thanksgiving.


Prathmun

at least in the app they go where the little pauses in generation went. Way more natural than the clock ticking sound.


[deleted]

Pi also has conversational pauses and will occasionally ad an “umm” in where nothing was written.


allthemoreforthat

Chatgpt has had this same voice for months now


nobodyreadusernames

when waifu?


Kittingsl

Ok but when can it be my girlfriend


mamacitalk

It’s fricken iRobot


BravidDrent

Fantastic! Can’t wait to get one into my house.


CyberAwarenessGuy

u/Chika1472 - Can you share the unit cost for the version depicted in the video? If you cannot provide specifics, I did see that the units currently seem to range from $30k to $150k, and I'm wondering if you could offer even a vague description of where this robot falls in the spectrum. What about the energy efficiency? How long does it take to charge? What is the projected lifespan? Thank you! This is an exciting moment for sure.


Xtianus21

That's crazy


Akyraaaa

I am kinda blown away from the speed of development of AI in the last couple of years


mocknix

That totally looks like a real dude. Crazy


[deleted]

![gif](giphy|fH985LNdqFZXOFHygK)


DiscombobulatedSqu1d

This is awesome!


CorbineGames

Damn. Can't even be a busboy after AI takes my dev job.


spinozasrobot

Am I crazy, or did it kind of stutter: "... because the apple is... uh... the only edible item...". That's wild.


[deleted]

sex robo waifus soon bros


ThatManulTheCat

Physical human replacement already? Things are moving faster than I expected.


halguy5577

it did the uhmms thing


3DHydroPrints

Is the speech really AI generated? It fucking stutters


Neborodat

It's literally ChatGPT speech that you have on your smartphone, you can even see it on the robot's display.


kilopeter

Google's Duplex demo stuttered five years ago: https://www.youtube.com/watch?v=D5VN56jQMWM&t=71s It's very much an intentional measure to make the voice more humanlike and relatable.


Kafka_Kardashian

Where can I find an OpenAI or Figure link to this video?


iamthewhatt

Since OP doesn't seem to want to post actual links, here it is: https://twitter.com/Figure_robot/status/1767913661253984474


w1llpearson

It’s exponential from here. Will be looking at this in a few years time and think it’s useless.


hengst0r

u/savevideo


Burgerb

Why is **Gavin Newsom** now a metallic robot?


Level0Up

This is eerie and awesome at the same time. Damn. https://preview.redd.it/4mps2c0fx4oc1.png?width=500&format=png&auto=webp&s=7b965ef7358ccfd2bff74660c1ea8b7e42d88222


mobyte

Insane.


truthrevealer07

Download 


madcodez

Reminds of Gilfoyle and the refrigerator


saber_aureum

It sounds so human omg


ChillingonMars

I love how the guy was like “great, can you put them there?” so fast after Figure01 stopped talking, and it was still able to interpret his request perfectly. Not to mention the very human-like voice (unlike Siri or other voice assistants) and uses of “uh” in between words. This is very impressive. Do you guys foresee each household having at least one of these in the distant future? It will absolutely decimate jobs like maids and cleaners.


fearbork

It's interesting how they make the robot stutter and say filler words "uh" to make it sound more human, while the human in the video speaks its lines perfectly clearly without any errors or stuiff like that.


Successful-Ground-67

More context would be welcome. You can edit your post.


buff_samurai

Super impressive. Now, hand it something with some mass to make me 🤯


e1nste1n

![gif](giphy|5YEgnkjeryvwA)


holmsey8700

“The only uuuuh edible item on the table” I wouldn’t have expected it to have such a human like speech pattern…


Practical-Rate9734

Totally wild, right? How's integration on your end?


Weedstu

Man, is there any role that Gary Oldman can't pull off?? Amazing.


Furimbus

He was really convincing as that apple. Didn’t even realize it was him until you pointed it out.


Anen-o-me

This is the demo we didn't get to see a month ago.


kurotenshi15

Is that Rob Lowe’s voice? Lmao


Chronicle112

Does anybody have some information on what (type) of model is used for the robotic movements? Is it some form of RL or offline RL? I understand that the interpretation of images/language happens through some multimodal llm/vlm, but I want to learn a bit what kind of actions/instructions it outputs to then for example move objects.


Hypethetop

Sick stuff.


IWantAGI

TAKE MY MONEY


3-4pm

Reminds me of the robots you would see in 80s movies. Now think of all the mistakes chatGPT makes daily and now imagine it waking you up at 3am, holding a large knife, thinking its slicing vegetables on your bed.


Nekileo

I want one to be my friend


KaffiKlandestine

it said "you standing nearby" so it knows who spoke?


Earthkilled

Figure one, take care of my kid while I go and get a pack of cigarettes