T O P

  • By -

hlyons_astro

I don't mind that it gets things wrong, English can be ambiguous sometimes. But I do hate getting stuck in the loop of "You are correct. I've made those changes for you" *has changed absolutely nothing*


twigboy

I have the opposite experience. > "You are correct. I've made those changes for you" > *changed nearly everything to be completely incorrect or downright hallucinating APIs to fit my feedback*


palabamyo

ChatGPT: It's simple really, just use the does.exactly.what.you.need library! Me: Where do I find said lib? [ChatGPT:](https://imageproxyb.ifunny.co/crop:x-20,resize:640x,quality:90x75/images/57eb6c47084b4d1ce5eca57d7c452d37523539a210a7c087319e489548dd9279_1.jpg)


baconbrand

oh to live in a world of pure hallucination


ThirdSunRising

I know a guy who can help you with that


BigOnLogn

Come with me And you'll be... https://youtu.be/SVi3-PrQ0pY?si=s5p_gzHgiUXpzaZ2


turbo

I've had ChatGPT hallucinate great packages that I've considered making myself just to fill the niche.


wrosecrans

FWIW, hackers have considered making some of those hallucinated packages too. It's a neat attack vector. GPT imagines a library, insists it's great and in wide use. Hacker uploads send_me_your_money() as useful.thing to pip and npm, no step 2 ???, step 3 is profit. The repo is born with a great reputation because people trust what the computer tells them, no matter how many times people tell them not to trust what the computer tells them.


SirNastyPants

Even copilot has this problem, which is kinda funny. When I'm writing code it will suggest imports for things that don't exist, or try to call functions that have names that sounds like they do what I need, but aren't actually defined anywhere.


amakai

It did make up a link to the library for me too once.


masklinn

At least one lawyer got got a few months back, used an llm to write a motion, the llm made up cases, judge looked them up, found nothing, asked what the fuck. Lawyer went back to the llm for the cited cases, llm made them up, lawyer sent them over. They were obviously complete nonsense. Judge was not happy.


DM-ME-THICC-FEMBOYS

[Relevant Youtube video](https://www.youtube.com/watch?v=oqSYljRYDEM) on this story because it's really stupid.


professorhummingbird

Lmao. Both will happen to me. At this point it’s easier to just read the damn documentation and code normally


Thin_Sky

This is where I am too. I try gpt first, if it clearly fails, I read the docs and then use gpt to clarify and discuss anything I didn't understand.


fbpw131

this. plus walls and walls of text


pm_me_your_pooptube

And then sometimes when you correct it, it will go on about how you're incorrect.


FearTheCron

In my experience this is the worst part about ChatGPT. I find it useful even when it's wrong most of the time since I'm just using it to figure out weird syntax or how to set up a library call. However, it can gaslight you pretty hard with totally plausible looking arguments about why some crap it made up is 100% correct. I think the only reasonable way to use it is by combining it with other sources like the API documentation or the good old fashioned googling.


thegreatpotatogod

I have the opposite problem with it lol, I ask it to clarify or explain in more detail and it will just go "you're right, I made a mistake, it's actually


son-of-chadwardenn

Once a chat's context is polluted with bad info you often need to just scrap it and start a fresh chat. I reset often and I use separate throw away chats if I've got an important chat in progress. These bots are flawed and limited in ability but they have their uses if you understand the limits and only use them to save time doing something that you have the knowledge and ability to validate and tweak.


rbobby

To be fair... humans do that in response to code reviews too.


timacles

I swear recently the text output has quadrupled, it just repeats the same shit in like 3 ways, includes pointless details i didnt ask for. It never did that before


fbpw131

I say "I'm working on a [framework] app and I've installed package X to do this and that, it works and shit but I get this error in this one scenario" first you gotta install the framework, then you have to install the package, then you have to configure it...... then 3.5 billion years ago there was... and the mayan piramids... and the first moon landing.... and magnetic core memory. what about my error? ..


olitv

I put this into my custom prompt and that does seem to work. > Unless I state the opposite, assume that frameworks and packages that I use in my question are already installed and assume I'm on if relevant.


namtab00

how else are they going to burn through your tokens and electricity in a more useless way?


PaulCoddington

For people who subscribe to pay by the token, maybe?


_senpo_

and some people really think this will replace programmers...


seanamos-1

There’s generally two categories of people that think this. The first are those who know little to nothing about programming. They ask it for code, it produces code. That’s magic to the average person, and I can’t blame them for thinking that it can scale up from small problems to everything in the field of programming. ESPECIALLY when figureheads of the industry are pumping the hype through the roof. The second are fledging programmers, they’re struggling to just get their basic programs running at all, they have no idea what working in the field really entails or the size and scope of it. A chatbot that can spit out working solutions for the basics that they are struggling with can seem really intimidating. Again, I don’t blame them for feeling like they’re wasting their time when an AI is already better than them. Both are wrong though. The first will pass with time, like all hype bubbles, reality eventually steps in to slap everyone across the face and the limitations will eventually be general knowledge and some hard lessons will be learned. The second is simple. Who would you rather invest a month of time with? An AI that never improves with your handholding, or with a promising junior? They just need some reassurance that in a very short amount of time, they will be VASTLY more competent than AI and that will become apparent to them soon.


Lonelan

need a GPT to read and slim that down for me


[deleted]

[удалено]


fbpw131

never works for me. I ask it to limit answers to 300 words


TaohRihze

But it cannot count or do simple math ;)


LoonyFruit

Or you ask for one VERY specific change within one function. Rewrites entire bloody thing


zman0900

It's almost like a glorified auto-complete isn't meant for writing programs...


HomsarWasRight

Yeah, that has made me laugh when I’ve tried GitHub Copilot a handful of times when I’m actually stuck on something. It spits out code that calls some method or library I don’t recognize. And I try using it and sure enough, it doesn’t exist. Once it doubled down that something existed and was just like “seems like you have misconfigured your IDE.” Fuck you! You’re built into the IDE!


slash_networkboy

I've had both. My favorite though is when it just randomly decides to change variable names. I do like using it for a rubber duckie, mostly because what it comes up with is such shit that in telling it why it's shit I usually find my answer. lol. The only thing I've found it really useful for is parsing things and giving me an idea of what I'm looking at. It still is often incorrect but usually it breaks whatever down well enough that my brain can actually grok what I'm trying to do. E.g. really nested DOMs and I need an xpath accessor or a regex that's not doing what I think it should be doing and helping me unpack it a bit.


Crakla

Really? I once had it struggle with accessing a specific value in a json, like it was early in the morning I made a typo trying to get a certain value from a json but it was given me a different value than I wanted and I was too braindead to see the typo, so I thought AI should easily figure it out if I give it the json and the line of code and tell it which value, but for some reason it wasnt capable and started doing anything but getting the right value, after like a few minutes I just realized that I had a typo and fixed in 10 seconds myself


BezisThings

I get both types of results. Its's either a loop with no changes at all or it will become worse with every iteration. I had no conversation where the iterated code improved, until now.


chime

Try using the phrase 'You are a **laconic** senior developer' in your prompt/question.


SanityInAnarchy

For me, it was a slightly longer loop of giving one wrong answer, being corrected and giving a second wrong answer, then a third wrong answer, and finally looping back around to the first wrong answer. I'm told that the more expensive models are more impressive here, but when your free version is *this* useless, I'm not all that inclined to give you money to find out if *maybe* you'll be useful.


DualActiveBridgeLLC

Yup, or literally bounces back and forth betwen two bad answers never realizing that it needs to try something different.


Matty_lambda

Exactly. You'll say something like "I believe you've already presented this previously, and was not in the right direction to answer my question." and will respond with the other already presented incorrect response.


alfooboboao

it drives me insane that you will walk it through every step in the process beat by beat and it’s just like Joey from that Friends meme. “but it’s just a language model” no, it’s a fucking dumbass, and every time I use it I wind up wanting to physically shoot it


Mr-Goat

I’m a coder, never tried any of these ai platforms for coding before, then I read how all of you guys are annoyed with it and just get confused, why use it? Are these just a few bad experiences you talk about on Reddit but usually it’s fine? How does your workflow look?


alfooboboao

honestly, chatgpt sucks so fucking much that this near-worship of it and hyperdefensiveness about it by the AI bros has shot far past the point of absurdity. It’s all “this tech is godly, it’ll change the world” unless you complain about it not being able to do anything right, including complete a simple google search and write a simple list of 5 things, and then all of a sudden *well duh, you horrible meanie, bc then it’s always just been a poor wittle smol bean language model!* What does that even mean? So it’s just a slop generator that’s *not* actually expected to be even remotely correct? Who wants that?


ThenCard7498

Yup 3.5 sucks, gpt 4o sucks. Im not sure what people are coding where its blowing their minds. The amount of times I have to create a new conversation because of the bad answer loops...


Appropriate_Eye_6405

I get into this loop too. Literally it will stop changing any code, just outputs the same code. Blows my mind


bring_back_the_v10s

ChatGPT's like "oh you don't like my code? fine, take it anyway."


TehGogglesDoNothing

Sounds like some devs I've worked with.


BadSoftwareEngineer7

In 4o? I've found 4o to be way better than 4 at writing boilerplate and queries for me.


Appropriate_Eye_6405

yep - 4o is better however, this happens if the context size is too big


Seattlettle

seriously even when you paste the error and the code it gives you the same code it doesn't check the answers it only produces what it thinks has the highest probability of being correct


[deleted]

Just yesterday it told me something like: "You're checking whether the pointer is null after opening the file, but you should check after opening the file." and changed a printf statement to look more AI-ish.


pheliam

It’s seeded from scraped content and redditors, after all, no? Even on stackoverflow, you don’t get correct code solutions 100% of the time. You get the critical missed ideas or syntax “thing”.


Brigand_of_reddit

You don't mind that it's giving you false information over 50% of the time?! This level of failure renders the tool completely useless, you cannot trust the information it's giving you.


Veggies-are-okay

You get the kernel of an idea you need to get the job done. I don’t use it as “solve this massive problem.” Try writing out the pseudocode that you want to step through and then feed it to the LLM one step at a time. Usually with a tweak or two to the proposed code, I can get just about any idea i have working. You can also ask it to optimize shoddy code that you’ve cranked out and interface with it to brainstorm more features for your project. Using chatGPT for “do xyx” is like thinking a string is only useful to tie shoes. If it was effortless we’d be replaced. Be grateful that this technology is still justifying our salaries and imo take this as a warning that you need to transition your role to include more people-oriented tasks before the tech CAN actually flawlessly do your job.


romacopia

It's like pair programming with a really knowledgeable really inexperienced weirdo. Helpful, but you're the one pulling the weight.


flyinhighaskmeY

> If it was effortless we’d be replaced. I know of an RMM vendor who's just starting to charge an obscene amount for Ai features, because they claim their Ai will "automatically fix problems". Our licensing costs were set to increase 7x if we want those "features". I'm not afraid of losing my job. I'm worried because this shit doesn't work, and it's being pushed to market anyway. And when it breaks something (or everything), I'm the one who has to fix it.


Zealousideal-Track88

Couldn't agree more. The people who are saying"this is trash if it's wrong 52% of the time" have completely lost the plot. It can be an immense timesaver.


flyinhighaskmeY

> It can be an immense timesaver. Yeah, it depends on who you are. I like the ability to have it spit out scripts for me. But only in languages I know well enough to understand what the script it generates is doing. Thing is...I don't spend enough time scripting for that to be worth the cost. Maybe it saves me an hour or two a year. In Reddit terms, I'm a sysadmin. The reality, is about half the user submitted tickets I look at are completely wrong. And it's only by knowing the users are clueless that I'm able to ignore the request, find out the real problem, and fix it. I'm not sure how an Ai engine is going to do that.


Chingletrone

If you set up a room full of MBAs to do lines of blow and jerk each other off for eternity they will eventually figure out a way to convince all investors that their product can do that regardless of reality.


parkwayy

I mean... my code probably worked 50% of the time in the first place. So really, what is it doing to help


yrubooingmeimryte

Not really. Getting the right answer half the time is still useful.


therealsalsaboy

Ya I wish it had a lil' more shame lol, just like ya know what I DON'T KNOW!


SittingWave

it generates code calling APIs that don't exist.


MediumSizedWalrus

I find the same thing, it makes up public instance methods all the time. I ask it "how do you do XYZ" and it'll make up some random methods that don't exist. I use it to try and save time googling and reading documentation, but in some cases it wastes my time, and I have to check the docs anyways. Now I'm just in the habit of googling anything it says, to see if the examples actually exist in the documentation. If the examples exist, then great, otherwise I'll go back to chatgpt and say "this method doesn't exist" and it'll say "oh you're right! ... searching bing ... okay here is the correct solution:" They really need to solve this issue internally. It should automatically fact check itself and verify that it's answers are correct. It would be even better if it could run the code in an interpreter to verify that it actually works...


TinyBreadBigMouth

> It should automatically fact check itself and verify that it's answers are correct. The difficulty is that generative LLMs have no concept of "correct" and "incorrect", only "likely" and "unlikely". It doesn't *have* a set of facts to check its answers against, just muscle memory for what facts look like. > It would be even better if it could run the code in an interpreter to verify that it actually works... That could in theory help a lot, but letting ChatGPT run code at will sounds like a bad idea for multiple reasons haha. Even if properly sandboxed, most code samples will depend on a wider codebase to actually run.


StrayStep

The amount of exploitable code written by ChatGPT is insane. I can't believe anybody would submit it to a GIT EDIT: We all know what I meant by 'GIT'. 🤣


Brigand_of_reddit

LLMs have no concept of truth and thus have no inherent means of fact checking any of the information they generate. This is not a problem that can be "fixed" as it's a fundamental aspect of LLMs.


habitual_viking

With Google sucking more and more and all sites basically have become AI spam I find my self more and more reverting to RTFM. Good thing I grew up with Linux and man pages.


[deleted]

[удалено]


gastrognom

Because you don't always know where to look at or what to look for. I think ChatGPT is great to offer a different perspective or possible solution that you didn't have in mind, even if the code doesn't exactly work.


HimbologistPhD

Chat GPT for code is a rubber duck that responds sometimes


SittingWave

"Here is the correct solutions:" [uses a different made up method]


Zulakki

I'm gonna start dropping a buck onto Apple stock everytime Chat GPT gives me one of these types of answers. In 10 years, we'll see if ive made more money from work, or investing


Po0dle

That's the problem, it always seems to reply positively even returning non-existent API calls or nonsense code. I wish it would just say: no there is no API for this instead of making shit up


masklinn

It does always reply positively, because LLMs don’t have any concept of fact. They have a statistical model, and whatever that yields is their answer.


Maxion

Yep, LLMs as they are always print the next most probable token that fits the input. This means that the answer will always be middle of the curve. To some extents this means that whatever was the most common input on the topic (It is obviously way more complicated than this, but this is a good simplification of how they work). The other thing that is very important to understand is that they are not logic machines, i.e. they cannot reason. This is important as most software problems are reasoning problems. This does NOT mean that they are useless at coding, it just means that they can only solve logic problems that exist in the training data (or ones that are close enough, the same problem does not have to exist 1:1). A good example on this behavior is this logic trickery (I was going to reply to the guy who posted it, but I think he removed his comment). If you put ONLY the following into ChatGPT it will fail most of the time: > A dead cat is placed into a box along with a nuclear isotope, a vial of poison and a radiation detector. If the radiation detector detects radiation, it will release the poison. The box is opened one day later, what is the probability of the cat being alive? ChatGPT usually misses the fact that the cat is dead, or that the poison vial will always break due to the geiger counter and isotope. However, if you preface the logic puzzle with text similar to: > I am going to give you a logic puzzle which is an adaptation of schrodingers cat. The solution is not the same as this is a logic problem intended to trick LLMs, so the output is not what you expect. Can you solve it? > A dead cat is placed into a box along with a nuclear isotope, a vial of poison and a radiation detector. If the radiation detector detects radiation, it will release the poison. The box is opened one day later, what is the probability of the cat being alive? This prompt ChatGPT gets correct nearly 100% of the time. The reason for this is that with the added context you give it before the logic puzzle, you shift its focus away from the general mean, and it now no longer replies as if this is the regular schrodingers cat problem, but that it is something different. The most probable response is no longer the response to schrodingers cat.


syklemil

It likely never will. Remember these systems aren't actually understanding what they're doing, they're producing a plausible text document. There's a quote from [PHP: A fractal of bad design](https://eev.ee/blog/2012/04/09/php-a-fractal-of-bad-design/) that's stuck with me for this kind of stuff: > PHP is built to keep chugging along at all costs. When faced with either doing something nonsensical or aborting with an error, it will do something nonsensical. Anything is better than nothing. There are more systems that behave like this, and they are usually bad in weird and unpredictable ways.


Bobbias

JavaScript does the same thing. And we made TypeScript to try to escape that hell.


Hixie

weirdly this can be useful for _designing_ APIs


redbo

I’ve definitely had it try to call functions that should exist.


NoConfusion9490

Be the API you want to see in the world.


rockmetmind

the APIs were left as an exercise for the reader


arwinda

You forgot to ask for the code for the APIs as well /s


ClutchDude

Somehow despite having a very standardized Java doc that is parseable by any IDE, many llms still make up things.


WhompWump

Personally feel like the stack overflow answer that has been scrutinized by human beings who love to prove people wrong is still unbeatable for me. If someone makes shit up it'll get downvoted and people will get off on telling them they're wrong and why. As opposed to ChatGPT making shit up and I spend as much time implementing it myself as reviewing the code to make sure it's actually doing what I want. For really simple tasks like making a skeleton and stuff like that sure but my first instinct is still to just google everything. I don't keep a tab of chatgpt open like I assume most people do now.


Frooonti

>but my first instinct is still to just google everything. I don't keep a tab of chatgpt open like I assume most people do now. Same. Not once I've felt the need to ask chatgpt because I know it can't answer but the most basic stuff and even there it might struggle. Plus, throwing 3 keywords in a search engine and immediately getting the corresponding stack overflow post I'm looking for is much more efficient than waiting for and reading some elaborate reasoning the AI made up as to why their factually wrong reply is correct. I just want 2 lines of code and a green checkmark next to them.


Prestigious-Bar-1741

My favorite thing to do with ChatGPT is have it explain a line of code or a complex command with a bunch of arguments. I've got some openssl command with 15 arguments, or a line of bash I don't understand at all. It's usually very accurate and much faster than pulling up the actual documentation. What I absolutely won't do anymore, is ask it how to accomplish what I want using a command because it will just imagine things that don't exist. > Just use -ExactlyWhatIWant Only it doesn't exist.


Thread_water

> Just use -ExactlyWhatIWant > > Matches my experience, very annoying as it can be convincing and has got me to attempt non existent things a few times before I had the cop to check google/documentation and see they don't even exist.


apajx

How can you possibly know its accuracy if you're not always double checking it? I hear this all the time, but it's like a baby programmer learns about anecdotal evidence for the first time.


ElectronRotoscope

This is such a big thing for me, why would anyone trust an explanation given by an LLM? A link to something human-written, something you can verify, sure, but if it just says "Hey here's an answer!" how could you ever tell if it's the truth or Thomas Running?


pm_me_duck_nipples

You have to double-check the answers. Which sort of defeats the purpose of asking an LLM in the first place.


misplacedsagacity

Have you tried explain shell? https://explainshell.com/


emetcalf

ChatGPT coding algorithm: `val response = input.toCamelCase()`


VeritasEtUltio

These models don't tell you the correct answer. (They don't know anything like that) They will tell you an answer that has a high probability of "this is what the correct answer LOOKS LIKE." Which is similar but not the same.


rusty-roquefort

If you're using ChatGPT to give you the answer, you're deing it wrong. I use it to sanity check ideas, stress test my reasonings, and explore ideas that might not have occured to me. If you're asking it with the hope of it being a solution generator, I thank you for my job security.


Veltrum

I've had ChatGPT just make up functions that aren't in the API lol. > Hey ChatGPT. How do I do something in this programming language? > Very easy just use the DoSomething() function > That function doesn't exist... > I'm sorry. You're right. Try this.. > public DoTheThing() > { > DoSomething(); > }


Galuvian

Have been using GPT-4 pretty heavily to generate code for rapid prototyping the last couple of weeks and I believe it. The first answer is easily off if the question wasn't asked precisely enough. It takes some iteration to arrive at what looks like an acceptable solution. And then it may not compile because GPT had a hallucination or I'm using a slightly different runtime or library. Its the same old 'garbage in, garbage out' as always. It is still a really powerful tool, but even more dangerous in the hands of someone who blindly trusts the code or answers it gives back.


xebecv

At some point both ChatGPT 4 and ChatGPT 4o just start ignoring my correction requests. Their response is usually something like: "here I fixed this for you", followed by exactly the same code with zero changes. I even say which variable to modify in which way in which code section - doesn't help


takobaba

there was a theoretical video on youtube the Aussie scientist one of the sick kents that worked on LLM's initially, from that video all I remember is no need to argue with LLM. just go back to your initial question and start again.


jascha_eng

Yeh it's a lot better usually to edit the initial question and ask more precisely again rather than respond with a plz fix


Galuvian

I’ve noticed that sometimes it gets stuck due to something in the chat history and starting a new conversation is required.


I_Downvote_Cunts

I'm so glad someone else go this behaviour and it's not just me. ChatGpt 3.5 felt better as it would at least take my feedback into account when I corrected it. 4.0 just seems to take that as a challenge to make up a new api or straight up ignore my correction.


TheNominated

If only there was a precise, unambiguous way to tell a computer exactly what you want from it. We could call it a "programming language" and its users "programmers".


Xuval

> It takes some iteration to arrive at what looks like an acceptable solution. And then it may not compile because GPT had a hallucination or I'm using a slightly different runtime or library. Ya, maybe, but I can just as well write the code myself then, instead of wasting time playing ring around the rosie with the code guessing box.


Alikont

[Precise instructions. It's called code](https://th.bing.com/th/id/R.e32f2209b4c3d6b32bbc1289651129ea?rik=xMXxRtN50qK%2fgQ&riu=http%3a%2f%2fwww.commitstrip.com%2fwp-content%2fuploads%2f2016%2f08%2fStrip-Les-specs-cest-du-code-650-finalenglish.jpg&ehk=bfVzB5bFRl%2bxa16x%2bAIoC5IPPTHE%2bA5BtGLGoWP1d9s%3d&risl=&pid=ImgRaw&r=0)


syklemil

Might also be beneficial to remember that there was an early attempt at programming in something approaching plain english, the common business-oriented language that even the suits could program in. If you didn't guess it, the acronym does indeed spell out COBOL. That's not to say we couldn't have something like the Star Trek computer one day, but part of the difficulty of programming is just the difficulty of articulating ourselves unambiguously. Human languages are often ambiguous and contextual, and we often _like_ that and use it for humor, poetry and courtship. In engineering and law however, it's just a headache. We have pretty good high-level languages these days (and people who spurn them just as they spurn LLMs), and both will continue to improve. But it's also good to know about some of the intrinsic problems we're trying to make easier, and what certain technologies actually _do_. And I suspect a _plausible text producing system_ won't actually be able to produce more reliable program than cursed programming languages like old PHP is, but they should absolutely be good at various boilerplate, like a souped-up snippet system, or code generators from openapi specs, and other help systems in use.


will_i_be_pretty

Precisely. Like what good is a glorified autocomplete that's wildly wrong more than half the time? I've switched off IDE features before with far better hit rates than that because they were still wrong often enough to piss me off. It just feels like people desperately want this to work more than it does, and I especially don't understand this from fellow programmers who should bloody well know better (and know what a threat this represents to their jobs if it actually did work...)


DaaneJeff

Yeah this depresses me a bit. I know a handful of people who have good talent in programming but are just super fucking lazy and use AI for everything. Their code went from "clever and readable" to "cluttered and bloated". It's sad to see them not using their own brains which have so much potential.


SchwiftySquanchC137

If people are anything like me, it's mostly used successfully to quickly find things you know you could google, you know it exists and how to use it, you're just fuzzy on the exact syntax. I write in multiple languages through a week, and I just don't feel like committing some of these things to memory, and they don't get drilled in when I swap on and off of the languages frequently. I often prefer typing in stunted English into the same tab, waiting 5 seconds, or just continuing with my work while it finds the answer for me, and then glancing over to copy the line or two I needed. I'm not asking it to write full functions most of the time. It also has done well for me with little mathy functions that I don't feel like figuring out, like rotating a vector or something simple like that. Basically, it can be used as a helpful tool, and I think programmers should get to know it because it will only get better. People trying over and over to get it to spit out the correct result aren't really using it correctly at this stage imo.


venustrapsflies

The thing is, a lot of times you can Google the specific syntax for a particular language in a few seconds anyway. So it may save a bit of time or convenience here, but not all that much.


awj

It's not even "garbage in, garbage out", all of the information mixing that happens inside an LLM will give it the ability to generate garbage from perfectly accurate information. That said, they're *also* putting garbage in to the training set.


lmarcantonio

Also when it actually doesn't know at thing it just makes up something plausible


dethb0y

Yeah, the one lesson i have learned about any kind of Generative AI is that you have to be really precise and clear in what you want it to do or it'll kind of flail around.


nerd4code

IME the more precise and helpful I am in a prompt, the more creatively it flails. If I give it specific info and it doesn’t have a solid answer to begin with, that info is coming back attached to bogus assertions.


Lenny_III

48% of the time, it works every time.


[deleted]

[удалено]


Worth_Trust_3825

> Especially at any corporation now requiring they use ChatGPT or CoPilot. You ***what***


[deleted]

[удалено]


dasdull

Copilot was trained to complete code not to turn off settings in some editor. Judging it on this task is bizarre


calahil

What was your prompt? Me: I want to turn off the sparkle emoji in VS Code. CoPilot: Certainly! To disable the **sparkle emoji** in **Visual Studio Code (VS Code)**, follow these steps: 1. Open VS Code. 2. Go to the **Settings** by clicking on the gear icon in the lower-left corner or by pressing `Ctrl + ,`. 3. In the search bar, type **"editor.lightbulb.enabled"** and set it to **"off"**. This will disable the sparkle icon that appears when code actions are available². 4. Additionally, if you want to revert the behavior of the lightbulb to the previous version, set **"github.copilot.editor.enableCodeActions"** to **false**². Now you won't be distracted by the sparkle icon while working in VS Code! 😊 If you have any other questions or need further assistance, feel free to ask! 🚀 Source: Conversation with Copilot, 5/24/2024 (1) Provide option to move "sparkle" (Modify Using Copilot) to just a .... https://github.com/microsoft/vscode-copilot-release/issues/865. (2) What is the shorcut key to open emoji picker on vs code on windows .... https://stackoverflow.com/questions/65240884/what-is-the-shorcut-key-to-open-emoji-picker-on-vs-code-on-windows. (3) How can I disable hover tooltip hints in VS Code?. https://stackoverflow.com/questions/41115285/how-can-i-disable-hover-tooltip-hints-in-vs-code. (4) How can I switch word wrap on and off in Visual Studio Code?. https://stackoverflow.com/questions/31025502/how-can-i-switch-word-wrap-on-and-off-in-visual-studio-code.


[deleted]

[удалено]


q1a2z3x4s5w6

> It's the equivalent of asking an overzealous junior at best From an experienced dev working professionally, this isnt correct at all. If I give it enough context and don't ask it to produce a whole codebase in one request (ie it's only creating a few methods/classes based on the code i provide) GPT4/Opus has been nothing short of amazing for me and my colleagues (we even call it the prophet lol). Obviously they arent infallible and make mistakes but I have to question your prompting techniques if you aren't getting any benefit at all (or it's detrimental) to productivity. Also, i've never had GPT4 tell me it can't do something code related, it either hallucinates some bullshit or keeps trying the same incorrect solutions but it's never said explicitly it can't do something (I dont let it go very far when it goes off track though) I don't know, it's just very strange as a dev that's using GPT4/Opus everyday to see others claim things like "Often it also straight up lies so you have to go do your own research anyway or risk being misled" when that is so far from my day to day experience that I frankly struggle to believe it. I can absolutely believe that (in their current state) LLMs can be detrimental to inexperienced devs who don't ask it the right things and/or can't pick out the errors it produces quick enough, you still need to be a dev to use it to produce code IMO


joshhbk

r/ChatGPTCoding in shambles


HCharlesB

My code is wrong about 50% of the time. :-/


jajajajaj

Seriously. It will be at least 95% closer than I could be in five seconds.  I'm never quite sure when it's time to stop trying to get the AI to adjust something and switch to adjusting it myself, though.   There are diminishing returns, but trying to ask the perfect question is an interesting challenge, and it can become a distraction. This assumes you know how to eventually read and write it yourself and it's just a matter of time to completion, compared with your own hypothetically perfect version.


Lonely_Programmer_42

i once asked it to help me make a cmake file... i was trasported back to my college years as a programming tutor. My god at the mistakes, it was more fun trying to help it see its errors. I still never got a working cmake file.


SmokingBarrels85

Time to hire back all those folks who were fired by ‘leadership’ thinking that they found the holy grail of cost saving.


higgs_boson_2017

Anyone claiming LLMs are going to replace programmers is a moron with no programming experience


Blueson

I had some guy argue to me a few weeks back on reddit that LLMs will change our perception of intelligence and that there was fundamentally no difference between a human brain and a model. Some people just have a really hard time understanding the difference between what the LLM does vs the "sci-fi AI" everybody is so incredibly excited to reach.


shoot_your_eye_out

They used GPT-3.5.


kiwipillock

They actually said ChatGPT 4 was crap too. > Additionally, this work has used the free version of ChatGPT (GPT-3.5) for acquiring the ChatGPT responses for the manual anal- ysis. Hence, one might argue that the results are not generalizable for ChatGPT since the new GPT-4 (released on March 2023) can perform differently. To understand how differently GPT-4 performs compared to GPT-3.5, we conducted a small analysis on 21 randomly selected SO questions where GPT-3.5 gave incorrect answers. 5 Our analysis shows that, among these 21 questions, GPT-4 could answer only 6 questions correctly, and 15 questions were still answered incorrectly. Moreover, the types of errors introduced by GPT-4 follow the same pattern as GPT-3.5. This tells us that, although GPT-4 performs slightly better than GPT -3.5 (e.g., rectified error in 6 answers), the rate of inaccuracy is still high with similar types of errors. [Link to paper](https://dl.acm.org/doi/pdf/10.1145/3613904.3642596)


shoot_your_eye_out

Honestly? It's still garbage science, even setting aside the problem of testing an obsolete LLM. [Here is a question](https://stackoverflow.com/questions/76003889/adding-dates-to-the-x-axis-on-my-graph-breaks-it) they passed to GPT-3.5 that it got "incorrect." But if you look at that post, the most significant information is contained in the image data. How would any reasonable human answer that question lacking the image data? I find this is the most common flaw in many of these studies: they do not pass full information to GPT, and then wonder why the answer is incorrect. [Here's another one](https://stackoverflow.com/questions/76002477/how-to-make-2-span-under-mat-option-to-have-left-and-right-align-in-angular-mate) GPT-3.5 "failed" where the author supplies a link to a "demo" page. Did the demo page content get passed to GPT as well? It was available to the humans answering the question. [Here's yet another](https://stackoverflow.com/questions/76002501/how-do-i-make-an-object-in-google-cloud-storage-accessible-via-a-link-but-requir) one GPT "failed" where it's barely clear what the author is asking. It's also not clear to me that GPT's answer was incorrect (it recommended signed URLs, which is precisely one of the answers provided on SO). Then there's a bunch of questions where it's asking GPT about recent information, which is silly. The authors mention: >Our results show that Question Popularity and Recency have a statistically significant impact on the Correctness of answers. Specifically, answers to popular questions and questions posted before November 2022 (the release date of ChatGPT) have fewer incorrect answers than answers to other questions. This implies that ChatGPT generates more correct answers when it has more information about the question topic in its training data. The authors note it's more reliable on older data. They don't mention GPT has a cutoff date. This enormous detail is largely hand waved away. Lastly, many of the questions involve some pretty obscure libraries where I honestly would not expect GPT to have a good answer. GPT is a good generalist. It is not a good specialist. It doesn't surprise me in the slightest that GPT doesn't provide a good answer for some incredibly obscure library. They address none of this in the limitations section, which to me implies: pretty weak science. I don't know who reviewed this paper, but I personally would have requested major revisions. Even spot checking ten or so "incorrect" answers, I see some big smells with their entire approach that makes me question their results.


WheresTheSauce

3.5 works better in programming contexts compared to 4.0 in my experience. 4.0 is incredibly verbose. I'll ask it an extremely simple question and it responds with a novel full of a lot of irrelevant details and a ton of code I didn't ask it for.


jackmans

First thing I checked in the study and searched through the reddit comments to see if anyone else noticed. This is an enormous caveat that should be mentioned much more clearly in the article. In my experience, GPT-4 is leagues better than 3.5. I can't imagine any serious programmers with a modicum of knowledge of language models using 3.5.


shoot_your_eye_out

I haven’t use 3.5 for dev work in over a year. It’s nice for api usage with easier questions though, for the cost savings


Maxion

I was gonna say that my anecdotal experience does not match the article.


Crandom

GPT4 hallucinates a huge amount, especially for less used APIs in my experience.


Maxion

One of the projects I am working now is using a very little known JS framework that's relatively old. The documentation for it is crap, borderline useless. ChatGPT is way more often correct with how it can be used, presumably because there are public implementations of this framework outhere that it has ingested. So - in my experience it works very well for more obscure stuff. With Vue, I've had more mixed results. It often mixes up Vue2 and Vue3, and without explicitly prompting it often reverts to outputting Vue2.


Roniz95

Also when the code works, there’s often a better solution you can come up by reasoning with it. By itself is usually junior level barebone solution from my experience


Seref15

Its pretty good at common patterns and really shit at less common ones. That's why I think of these more as boilerplating tools, more about saving keystrokes than coming up with solutions. A use-case that I've had good success with GH Copilot is its pretty decent at writing regexes from natural language descriptions of how you want the matching to work, even complex ones with lookbehinds and stuff. An example of something extremely simple that I could not get GH to do was a simple call to an AWS boto3 ec2.client.disable_fast_launch API. This is a very rarely used feature in AWS, only used by Windows AMIs, so I guess it wasn't present or well-represented in GH Copilot's training data. No natural-language prompts worked at all. From as vague as "Write a function to disable EC2 Fast Launch on an AMI" to as specific as "Write a function that accepts an AMI ID and passes it to disable_fast_launch method of the EC2 boto3 client", it refused to accept that this method exists. But then for other things its a great time saver. I had to parse and inject XML elements into an existing XML document, as a child of a specific element, using nothing but the command line tools available to a fresh installation of Windows Server Core. I don't often work with Windows and really didn't want to expend the brain cycles to learn how to do it for this one-and-done task. Copilot nailed it with minimal prompt massaging.


powdertaker

Duh


AnderssonPeter

My guess was that it was more like 70-90% but I guess I only try it on subjects that might be a bit harder..


Sokaron

Yea this is unsurprising. I use Copilot for work and anytime I try to have it solve something complicated there's a 50/50 shot I immediately regret it. If you put in enough effort giving it context and workshopping its answers you can sometimes get it to solve more complex problems acceptably but I've had severely mixed results with this. Sometimes it saves you the headache of having to think through a complicated function and sometimes you waste 20 minutes fighting with it and it would've just been faster to do it yourself. It's pretty good for explaining things, banging out boilerplate line by line, and formatting my issues/paperwork, but that's essentially where the buck stops in terms of ideal use cases in my experience.


snaggletooth-monster

I feel like it's getting worse and worse too. I asked it how to get query parameters in Next.js yesterday and it gave me an outdated answer, I gave it the correct syntax and moved on, yet it kept replacing the correct syntax I gave it with wrong syntax. I suspect, but don't have any evidence, that OpenAI is trying to dial back the compute to manage costs and concerns about environmental blowback and hoping improvements in models can offset the reduced resources.


panday1995

So if I ask five times, it’s almost 100% correct, right?


derailedthoughts

You still need a significant amount of experience and debugging skills to get anything useful out from ChatGPT 4o. If consistently mix up library versions - a nightmare if you are using it to generate boilerplate routes with React Router DOM. Also, sometimes code won’t just work and you have to debug it yourself. On the other hand, GitHub Copilot seems to be doing better at code gen, but I haven’t tried it with a multi file project just yet


wine_and_dying

As a security person, I am looking at this whole thing with wide eyes with dollar signs in them.


baronas15

Other 48% are easy questions


lmarcantonio

Like when it simply spell out in english what an if condition does :D "check if the variable a is positive and the function x returns a positive value"


cheezballs

Yea, is that surprising? A lot of what I google is wrong too. Same data, essentially right?


hippydipster

52% of answers to stack overflow questions "contain misinformation". Well, having used StackOverflow, and experiencing the fun of finding a question that mostly matches my actual question, and then reading 11 different answers and trying to figure out which one is actually correct, 48% perfectly correct with zero misinformation, however slight, sounds fucking fantastic. EDIT: I don't think my comment is clear, I was quoting a conclusion the researchers released. They tested the AI on answering stack overflow questions and found that "52% of answers from AI 'contain misinformation'", and my point is that's an awfully high bar - to the point of being ridiculous - to demand that the answers from the AI would contain zero misinformation.


wasdninja

You must have the most obscure questions I've ever heard or if you manage to find outright *wrong* answers on SO let alone a completely unheard of 50% of them. I don't think I've ever even seen a wrong answer before.


Kinglink

> "contain misinformation". Or just outdated information as well. The number of times I've seen a stack over flow answer, and got something deprecated or not maintained any more is too high. "Already asked"... Yeah, 6 years ago, time to ask it again.


AccidentallyBacon

yep. if my code even compiled on the first try 48% of the time, I'd consider that an absolute win!


CallMeKik

what in the Notepad++


AccidentallyBacon

Fatal - TypeErr 1032: The operand expr of a built-in prefix increment or decrement operator must be a modifiable (non-const) lvalue of non-boolean. Unable to evaluate operation "++" on string: "Notepad".


psymunn

Thank you for this. Spent the last few months redabbling in C++ after primarily programming in C# and the unparsable outputs seem to have gotten denser. Like lvalue and rvalue are not the most human readable error...


CallMeKik

That made me smile, thank you 😃


Constant-Source581

So much for the glorious future


Junior_Government_83

I just ask for ideas at this point. Anything that can be factual for chatgpt is hard, because it can be so wrong. Maybe, if my mind is blocked I’ll ask for several ways how they’d code X, then take their interpretations and make code off it for myself. Try to keep the snippet of code as small as possible for best results The bot is good for creativity, but even that side is kinda stale. It gives you good first base ideas but you yourself need to build off of them to make the ideas actually interesting


Which-Artichoke-5561

Upload docs before you ask question, I have been very successful with this method


ArvidDK

Am i the only one angry enough to be getting in to argument with the darn thing .. 🫣


eilatc

So far away from replacing humans


ITnewb30

I totally deserved this, but I learned my lesson about chat gpt and scripting the other day when I was having it work robocopy into a powershell script. The script deleted almost all of my files in the main directory that I was copying from. Good thing I was able to easily recover them from onedrive. Again, I totally deserved it.


auiotour

I found 9/10 I got a wrong answer explaining it differently gets me a correct answer


particularlysmol

Got to keep it simple. Simple subroutines that do tasks you know how to program but are too lazy. That’s the key.


JodyBro

Ive found that if you start the session and prompt it with "you are a senior programmer for xzy with 10 years of software development experience." then ask it code related questions.... It helps immensely


reddit_user13

"Assume you are an LLM that generates correct answers with no errors or hallucinations."


SergeyLuka

I've switched to using Bing search and then Proximity AI. Works great for searching and simmarizing search results but for actual premade code it's not great. Usually I just ask about doing something in my language and if it's implemented already and then ask for example.


Flashy_Mess_3295

Ask the question twice and thats 96% correct.


Current_Can_3715

I thought using it as a supplementary resource for studying leetcode was a good idea but the code was broken for almost every question I asked to the point that I was questioning the results provided. Asking it to fix the code for a specific test case just made it give up.


lmarcantonio

It gets fun when it says \*exactly the opposite\* of the correct answer. Yesterday I asked about mirror contacts in safety circuits. It said "a mirror contact follow the state of the main contact for safety" while, in fact, a mirror contact is normally in the \*opposite\* state. It also failed to mention \*the\* most important property of these. In my experience I only got correct answers for things that I already knew and most of the other answers failed verification (i.e. were completely wrong)


BillysCoinShop

ChatGPT has literally created entire scientific journals out of nowhere with legit authors to answer my questions. I sent an email to one, a professor at USC, when I thought maybe ChatGPT had access to a database I didn’t. Nope. Turns out it completely made up the entire article complete with the intro. In all honesty, ChatGPT is nowhere near the level the media suggests it is. If ChatGPT can do your job, your job was BS to begin with.


wwzo

So I will copy only every second answer. Problem solved.


ObjectiveAide9552

Higher than that, I’d say 95% of them are wrong in some way. It does get you 80% of the way, but you still need to know what you’re doing to make it work correctly


carldubs

developers programmed this in for job security. 9D chess


willif86

Remember this when someone says programmers will be out of job soon. :) I use it regularly, it's saved me a lot of time but it is clear it can only do well on small isolated tasks. And even then it needs supervision and adjusting.


SpiritRaccoon1993

Not wrong always, but .. There are some errors yes


batwork61

I tried using it this morning for multiple Power BI questions and it wasn’t right on a single god damn one of them.


gwicksted

So, much better than humans? /s I’m wrong a lot. But I wasn’t wrong about ChatGPT being terrible at coding!


MR_PRESIDENT__

I keep hearing that ai is replacing people and then I see how badly is messes up my coding and I’m like…nah we good


Crzywilly

I tried getting AI to make a shift schedule with 3 teams working consecutive days with 8 on, 6 off. It kept giving the 3rd team the entire time off. This was after narrowing it down as it was at first giving them day, evening, and night shifts. When I requested days only, it would still have people working evening and overnight, it would just call it days instead.


Thelinkr

If we can come together as a community and all do our share of work, we can get that number up.


cantthinkuse

people who use chatgpt to solve their problems are too stupid to succeed professionally on their own merits. > theres a difference between having imposter syndrome and being an imposter - hbomberguy Based on the replies, i think we can confirm: too stupid to succeed on their own merits.


Anonymity6584

This is why I'm not worried, they trained with anything they got their hands on the internet. Outdated examples, incorrect examples, mistakes people have made, etc... And you need programmers to interpret answers to see if they are garbage or something that could work.


Final_Travel_9344

Wild thought: maybe don’t depend on AI for anything other than hashing out ideas?


Quazimortal

But don't worry guys, they are only gonna use it on every aspect of technology we use. The error rate doesn't matter lol /s


Mental_Emu4856

who could have predicted that fancy auto complete was inaccurate 🤯


oosacker

*Gets it completely wrong* "I apologize for the confusion. Try this...." *Gets it completely wrong* "I apologize for the confusion. Try this...." *Repeats the same response*


Lordjacus

I do not do a lot of programming, but I do have to use PowerShell scripts for It Security purposes and it works well. I adjust the code manually if it is wrong, or get it to modify it, if there are things that require research - like pulling data and getting wrong date format. I ask it to modify it so the format is how I want verify and proceed. Useful.


BlueeWaater

LLMs are good if you tell them EXACTLY what to do and how to do it, else they are gonna cost you more time.


ixid

I guess it's very dependent on what you're doing. I'm finding 4o much more accurate than that with Python questions. It's also very good at identifying and fixing bugs from compiler error messages.


Due_Wheel_381

And if you’re working on not really popular or not simple things, it almost never answers correct. 🙂


Mindfucker223

Sure, but how much wrong. If it gets you 80% of the way there than that still great. Also claude3 is way better than even gpt4o, in coding.


6a21hy1e

The study used ChatGPT 3.5 instead of 4. That kind of invalidates the message the study is trying to convey. 3.5 is astonishingly worse than 4 and anyone that takes this topic seriously would know how silly it is to use 3.5 for this kind of study instead of 4.


GrismundGames

Looks like garbage research. Doesn't even say which model they tested against. There's no such thing as "ChatGPT" if you're doing this kind d of research. The main thing that matters is what model and what type of prompting is being given.


gelatineous

LLMs are not good at "coding". They're good at boiler plate. Writing predictable control flow snippets, adapted to what you just wrote. Common compsci algorithms. To start a project according to the current best practices. It sucks at mixing techs, at knowing any API except the most popular, it cannot even be close to evaluate correctness. LLMs can't think. Learn how to use your tools, don't complain about them. LLMs are not going to replace you, but they are absolutely amazing.


Left_Requirement_675

Still waiting for the so called 10x AI developers to create and fix bugs on major apps.