Rendered at 11:44:25 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
maccard 19 hours ago [-]
> try it, you might actually be amazed.
I keep being told this and the tools keep falling at the first hurdle. This morning I asked Claude to use a library to load a toml file in .net and print a value. It immediately explained how it was an easy file format to parse and didn’t need a library. I undid, went back to plan mode and it picked a library, added it and claimed it was done. Except the code didn’t compile.
Three iterations later of trying to get Claude to make it compile (it changed random lines around the clear problematic line) I fixed it by following the example in the readme, and told Claude.
I then asked Claude to parse the rest of the toml file, whereby it blew away the compile fix I had made..
This isn’t an isolated experience - I hit these fundamental blocking issues with pretty much every attempt to use these tools that isn’t “implement a web page”, and even when it does that it’s not long before it gets tangled up in something or other…
krastanov 19 hours ago [-]
This is fascinating to me. I completely believe you and I will not bother you with all the common "but did you try to tell it this or that" responses, but this is such a different experience from mine. I did the exact same task with claude in the Julia language last week, and everything worked perfectly. I am now in the habit of adding "keep it simple, use only public interfaces, do not use internals, be elegant and extremely minimal in your changes" to all my requests or SKILL.md or AGENTS.md files (because of the occasional failure like the one you described). But generally speaking, such complete failures have been so very rare for me, that it is amazing to see that others have had such a completely different experience.
aqua_coder 18 hours ago [-]
My experience with working with AI agents is that they can be verbose and do things that are too over complicated by default. Unless directed explicitly. Which may be the reason for this discrepancy.
jayd16 18 hours ago [-]
You say it doesn't fail but you also mention all these work around you know and try...sounds like it fails a lot but your tolerance is different.
jmalicki 17 hours ago [-]
Most people I've seen complain say things like "I asked it for code and it didn't compile."
The real magic of LLMs comes when they iterate until completion until the code compiles and the test passes, and you don't even bother looking at it until then.
Each step is pretty stupid, but the ability to very quickly doggedly keep at it until success quite often produces great work.
If you don't have linters that are checking for valid syntax and approved coding style, if you don't have tests to ensure the LLM doesn't screw up the code, you don't have good CI, you're going to have a bad time.
LLMs are just like extremely bright but sloppy junior devs - if you think about putting the same guardrails in place for your project you would for that case, things tend to work very well - you're giving the LLM a chance to check its work and self correct.
It's the agentic loop that makes it work, not the single-shot output of an LLM.
maccard 3 hours ago [-]
> The real magic of LLMs comes when they iterate until completion until the code compiles and the test passes, and you don't even bother looking at it until then.
If you read my post, you’d see that Claude code didn’t do that, I had to intervene in the agent loop and when I did it undid my fixes.
maleldil 52 minutes ago [-]
This is not compatible with many people's experiences. I use Python with a type checker. I tell Claude that the task is only complete once the type checker passes cleanly. It doesn't stop until there are no type errors. This should be even easier in a compiled language, especially if you also tell it to run the tests.
In fact, I find that with a strict feedback loop set up (i.e. a lot of lint rules, a strict type checker and fast unit tests), it will almost always generate what I want.
As someone else said, each step might be pretty stupid, but if you have a fast iteration loop, it can run until everything passes cleanly. My recommendation is to specify what really counts as "done" in your AGENTS.md/CLAUDE.md.
Stuff like this works for things that can be verified programmatically (though I find LLMs still do occasionally ignore instructions like this), but ensuring correct functionality and sensible code organization are bigger challenges.
There are techniques that can help deal with this but none of them work perfectly, and most of the time some direct oversight from me is required. And this really clips the potential productivity gains, because in order to effectively provide oversight you need to page in all the context of what's going on and how it ought to work, which is most of what the LLMs are in-theory helping you with.
LLMs are still very useful for certain tasks (bootstrapping in new unfamiliar domains, tedious plumbing or test fixture code), but the massive productivity gains people are claiming or alluding to still feel out of reach.
jmalicki 16 hours ago [-]
It depends - there are some very very difficult things that can still be easily verifiable!
For instance, if you are working on a compiler and have a huge test database of code to compile that all has tests itself, "all sample code must compile and pass tests, ensuring your new optimizer code gets adequate branch coverage in the process" - the underlying task can be very difficult, but you have large amounts of test coverage that have a very good chance at catching errors there.
At the very least "LLM code compiles, and is formatted and documented according to lint rules" is pretty basic. If people are saying LLM code doesn't compile, then yes, you are using it very incorrectly, as you're not even beginning to engage the agentic loop at all, as compiling is the simplest step.
Sure, a lot of more complex cases require oversight or don't work.
But "the code didn't compile" is definitely in "you're holding it wrong" territority, and it's not even subtle.
alecbz 16 hours ago [-]
Yeah performance optimization is potentially another good area for LLMs to shine, if you already have a sufficiently comprehensive test suite, because no functionality is changing. But if functionality is changing, you need to be in the loop to, at the very least, review the tests that the LLM outputs. Sometimes that's easier than reviewing the code itself, but other times I think it requires similar levels of context.
But honestly I think sane code organization is the bigger hurdle, which is a lot harder to get right without manual oversight. Which of course leads to the temptation to give up on reviewing the code and just trusting whatever the LLM outputs. But I'm skeptical this is a viable approach. LLMs, like human devs, seem to need reasonably well-organized code to be able to work in a codebase, but I think the code they output often falls short of this standard.
(But yes agree that getting the LLM to iterate until CI passes is table-stakes.)
jmalicki 16 hours ago [-]
Strongly agreed!
I think getting good code organization out of an LLM is one of the subtler things - I've learned quite a bit about what sort of things need to be specified, realizing that the LLM isn't actively learning my preferences particularly well, so there are some things about code organization I just have to be explicit about.
Which is more work, but less work than just writing the code myself to begin with.
zorak8me 18 hours ago [-]
Providing instruction and context doesn’t seem like a “workaround”.
theshrike79 5 hours ago [-]
Agents are LLMs that use tools in a loop
I didn't give my Agent any tools, it hallucinated code.
...well, yes. It's like evaluating a programmer's skill by having one-shot a program on paper with zero syntax errors.
plagiarist 18 hours ago [-]
IME it does fail pretty hard at first. One has to build up a little library of markdown and internal library of prompt techniques. Then it starts working okay. I agree there is a hurdle still, trying it on one task doesn't really get one over the hurdle.
shafyy 18 hours ago [-]
It's almost like.... LLMs are non-deterministic and hallucinating... Oh wait?!
krastanov 17 hours ago [-]
non-deterministic does not mean it is not biased towards a particular type of results (helpful results)
linsomniac 16 hours ago [-]
My friend, with all due respect, I don't think this is a problem with the AI.
I don't know anything about DotNet, but I just fired up Claude Code in an empty directory and asked it to create an example dotnet program using the Tomlyn library, it chugged away and ~5 minutes later I did a "grep Deserialize *" in the project and it came up with exactly the line you wanted (in your comment here) it to produce: var model = TomlSerializer.Deserialize<TomlTable>(tomlContent)!;
Please create a dotnet sample program that uses the library at https://github.com/xoofx/Tomlyn to parse the TOML file given on the command line. Please only use the Tomlyn library for parsing the TOML file. I don't have any dotnet tooling installed on my system, please let me know what is needed to compile this example when we get there. Please use an agent team consisting of a dotnet expert, a qa expert, a TOML expert a devils advocate and a dotnet on Linux expert.
I can't really comment on the code it produced (as I said, I don't use dotnet, I had to install dotnet on my system to try this), so I can't comment on the approach. 346 lines in Program.cs seems like a lot for an example TOML program, but I know Claude Code tends to do full error checking, etc, and it seems to have a lot of "pretty printing" code.
maccard 55 minutes ago [-]
I'm in claude code right now, with opus 4.6. Here [0] is my prompt, plan and result.
When I prompted, it went spelunking in the parent folder for a bunch of stuff, which I interrupted and told it to focus on its current folder. It came up with a sensible plan, which I let it execute. I then switched to a new tab and ran `dotnet run publishing/publish.cs` - like claude says. And the script fails at the first hurdle. The agentic loop came up with a plan, claimed it was done, but _didn't execute the verifiaction it told me it was going to execute_.
> This morning I asked Claude to use a library to load a toml file in .net and print a value.
Legit this morning Claude was essentially unusable for me
I could explicitly state things it should adjust and it wouldn't do it.
Not even after specifying again, reverting everything eventually and reprompt from the beginning etc. Even super trivial frontend things like "extract [code] into a separate component"
After 30 minutes of that I relented and went on to read a book. After lunch I tried again and it's intelligence was back to normal
It's so uncanny to experience how much ita performance changes - I strongly suspect anthropic is doing something whenever it's intelligence drops so much, esp. Because it's always temporary - but repeatable across sessions if occuring... Until it's normal again
But ultimately just speculation, I'm just a user after all
krackers 13 hours ago [-]
Here's an evil business idea: Use the LLMs to identify the users most likely to be "vocal influencers" and then prioritize resources for them, ensuring they get the best experience. You can engineer a bubble this way.
And then the next step is to dynamically vary resources based on prediction of user stickiness. User is frustrated and thinking of trying competitor -> allocate full resources. User is profiled as prone to gambling and will tolerate intermittent rewards -> can safely forward requests to gimped models. User is an resolute AI skeptic and unlikely to ever preach the gospels of vibecoding -> no need to waste resources on him.
vibe42 9 hours ago [-]
This is part of why running open models on hardware you control is valuable. They may trail SOTA by 6-12 months (really less for many use cases) but there's more reliability, control etc.
oro44 12 hours ago [-]
"Here's an evil business idea: Use the LLMs to identify the users most likely to be "vocal influencers" and then prioritize resources for them, ensuring they get the best experience. You can engineer a bubble this way."
Its quite likely this is already happening buddy...
The 'random' degradation across all LLM-based services is obvious at this point.
maccard 16 hours ago [-]
> Legit this morning Claude was essentially unusable for me
I could explicitly state things it should adjust and it wouldn't do it.
Honestly, this is my experience. Every now and again it just completely self implodes and gives up, and I’m left to pick up the pieces. Look at the other replies who are making sure I’m using the agrntic loop/correct model/specific enough prompt - I don’t know what they’re doing but I would love to try the tools they’re using.
pton_xd 13 hours ago [-]
Last year I had a friend chatting with me about how Claude had rather quickly transformed their small coding shop, except that they noticed after 3pm it consistently became incredibly dumb. I kind of laughed at the time but you know what, who knows. There's very likely some load-balancing shenanigans going on behind the scenes.
oro44 12 hours ago [-]
I thought it was well known at this point that the best usage happens outside of peak hours?
Ancalagon 12 hours ago [-]
I had a similar experience between this weekend and last weekend!
Maybe Anthropic is trying to cut costs a little and we are all just gaslighting ourselves into thinking its our problem.
fxtentacle 18 hours ago [-]
Same here. While LLMs sometimes work surprisingly well, I also encounter edge cases where they fail surprisingly badly multiple times per day. My guess is that other people maybe just don't bother to check what the AI says which would cause them to not notify omission errors.
Like when I was trying to find a physical store again with ChatGPT Pro 5.4 and asked it to prepare a list of candidates, but the shop just wasn't in the list, despite GPT claiming it to be exhaustive. When I then found it manually and asked GPT for advice on how I could improve my prompting in the future, it went full "aggressively agreeable" on me with "Excellent question! Now I can see exactly why my searches missed XY - this is a perfect learning opportunity. Here's what went wrong and what was missing: ..." and then 4 sections with 4 subsections each.
It's great to see the AI reflect on how it failed. But it's also kind of painful if you know that it'll forget all of this the moment the text is sent to me and that it will never ever learn from this mistake and do better in the future.
jmalicki 17 hours ago [-]
"I also encounter edge cases where they fail surprisingly badly multiple times per day. "
If 80% of the time they 10x my output, and the other 20% I can say "well they failed, I guess this one I have to do manually" - that's still an absolutely massive productivity boost.
maccard 16 hours ago [-]
But they don’t 10x my output - they write some code for a problem I/you have already thought about. The hard part isn’t writing the code, it never has been. It’s always been solving and breaking down the problem.
jmalicki 15 hours ago [-]
It's not the hard part, just the tedious part that takes the most time.
anonymous_user9 8 hours ago [-]
>It's great to see the AI reflect on how it failed. But it's also kind of painful...
Keep in mind that the AI is not reflecting, and it has no idea it made a mistake. It's just generating statistically-likely text for "apology" * "ai did not find results".
senordevnyc 16 hours ago [-]
Like when I was trying to find a physical store again with ChatGPT Pro 5.4 and asked it to prepare a list of candidates
I wonder if it was getting blocked on searches or something, and just didn't tell you.
pdntspa 17 hours ago [-]
[dead]
DougN7 18 hours ago [-]
I have similar experiences. It has worked about half the time, but the code has to be pretty simple. I’ve many experiences, where we work on something complicated for an hour, and it is good compiling code, but then an edge case comes to mind that I ask about and Claude tells me the whole approach is doomed and will never work for that case. It has even apologized a few times for misleading me :) I feel like it’s this weird mix of brilliant moron. But yeah, ask for a simple HTML page with a few fields and it rocks.
foobarchu 17 hours ago [-]
The best experiences I have are those where I can describe what I want done with details. Rather than asking it add toml parsing, I would tell it to exactly which library to use ahead of time and reduce the number of decisions available to the model to make. Some of the most effective use-cases are when you have a reference to give it, e.g. "add x feature the same way as in this other project that is also in the workspace", or "make the changes I made to the contents of directory X in git commit <sha here>, but applied to directory Y instead". In both cases it's a lot of copy/paste then tweaking an obvious value (like replacing "dev" with "QA" everywhere).
I try to give the model as little freedom as possible. That usually means it's not being used for novel work.
alecbz 16 hours ago [-]
> The best experiences I have are those where I can describe what I want done with details.
But that's the hard part! You can only eke out moderate productivity gains by automating the tedium of actually writing out the code, because it's a small fraction of software engineering.
senordevnyc 16 hours ago [-]
I basically never have agents do work without using plan mode first (I use Cursor). So I'll start with a high-level like: "I want to plan out the architecture for feature X, I think there's a linear project already that we should update once our plan is done, but if not we'll need to create one. Here's a picture of my whiteboard, I think I also wrote some notes in Obsidian a few weeks ago that you should look up. Two things off the top of my head to think about are: should we do X or does it make more sense to do Y? And also, I'm worried about how this might interfere with the blahblah system, so can you do some research around that? <ramble ramble>"
Then it crawls around for awhile, does some web searches, fetches docs from here and there, whatever. Sometimes it'll ask me some questions. And then it'll finally spit out a plan. I'll read through it and just give it a massive dump of issues, big and small, more questions I have, whatever. (I'll also often be spinning off new planning sessions for pre-work or ancillary tasks that I thought of while reviewing that plan). No structure or anything, just brain dump. Maybe two rounds of that, but usually just one. And then I'll either have it start building, or I'll have it stash in the linear agent so I can kick it off later.
JakeStone 18 hours ago [-]
If Claude ends up grabbing my C# TOML library, in my defense, I wrote it when the TOML format first came out over a dozen years ago, and never did anything more with it Sorry.
var model = TomlSerializer.Deserialize<TomlTable>(toml)!;
Which is in the readme of the repo. It could also have generated a class and deserialised into that. Instead it did something else (afraid I don’t have it handy sorry)
phromo 18 hours ago [-]
I don't know why but I find performance on c#/.net be several generations behind. Sometimes right ofc but my general experience is if you pull the generation slot machine in just about any other language it will work better. I regularly do python, typescript, ruby and rust with a better experience. It's even hard to find benchmarks where csharp is included.
ball_of_lint 16 hours ago [-]
Did you give claude access to run the compile step?
I remember having to write code on paper for my CS exams, and they expected it to compile! It was hard but I mostly got there. definitely made a few small mistakes though
maccard 16 hours ago [-]
Yes.
16 hours ago [-]
roncesvalles 18 hours ago [-]
Did you use the best model available to you (Opus 4.6)? There is a world of difference between using the highest model vs the fast one. The fast ones are basically useless and it's a shame that all these tools default to it.
fweimer 17 hours ago [-]
To make this more confusing, there is a fast mode of Opus 4.6, which (as far as I understand it) is supposed to deliver the same results as the standard mode. It's much more expensive, but advertised as more interactive.
moffkalast 17 hours ago [-]
Opus is not cost effective.
16 hours ago [-]
senordevnyc 16 hours ago [-]
Compared to what? I spent about $700 last month on Cursor, mostly on Opus 4.6. The time it saved me is at least 10-20x more valuable than that. I keep trying to use GPT-5.3 Codex, because it's like 40% cheaper, but then the quality doesn't always seem quite as good, and it's honestly not worth it to me to save a couple hundred bucks a month for something that's 10% worse. The multiplicative value of what I'm building means it's worth it to pay extra for the SOTA model in almost all cases.
senordevnyc 16 hours ago [-]
This is so perplexing to me. I've definitely hit these kinds of issues (which usually result in me cursing at the agent in all caps while telling it to get its shit together!), but it's almost always a long ways into a session where I know context rot is an issue, and the assumption it's making is a dumb one but it's also in the middle of a complex task...I just haven't had anything remotely like the situation you're describing, where Opus 4.6 can't make a simple change and verify that it compiles, can't look up docs, can't follow your instructions, etc. Bizarre.
jmalicki 1 hours ago [-]
It's been awhile, but I once spent four hours trying to get Opus 4.1 to change the color palette of a bar graph, that would've taken me 5 minutes.
The only reason I wasted so much time was that given all of my other positive experiences, clearly I must have been prompting it incorrectly, and I should learn how to fix it.
After blowing four hours I just changed the colors by hand. I took away that it's almost completely unpredictable what will be easy and what will be hard for an LLM.
wrs 18 hours ago [-]
I’m honestly baffled by this. I don’t want to tell you “you’re holding it wrong” but if this is your normal experience there’s something weird happening.
Friday afternoon I made a new directory and told Claude Code I wanted to make a Go proxy so I could have a request/callback HTTP API for a 3rd party service whose official API is only persistent websocket connections. I had it read the service’s API docs, engage in some back and forth to establish the architecture and library choices, and save out a phased implementation plan in plan mode. It implemented it in four phases with passing tests for each, then did live tests against the service in which it debugged its protocol mistakes using curl. Finally I had it do two rounds of code review with fresh context, and it fixed a race condition and made a few things cleaner. Total time, two hours.
I have noticed some people I work with have more trouble, and my vague intuition is it happens when they give Claude too much autonomy. It works better when you tell it what to do, rather than letting it decide. That can be at a pretty high level, though. Basically reduce the problem to a set of well-established subproblems that it’s familiar with. Same as you’d do with a junior developer, really.
thwarted 18 hours ago [-]
> it happens when they give Claude too much autonomy. It works better when you tell it what to do, rather than letting it decide. That can be at a pretty high level, though. Basically reduce the problem to a set of well-established subproblems that it’s familiar with. Same as you’d do with a junior developer, really.
Equating "junior developers" and "coding LLMs" is pretty lame. You handhold a junior developers so, eventually, you don't have to handhold anymore. The junior developer is expected to learn enough, and be trusted enough, to operate more autonomously. "Junior developers" don't exist solely to do your bidding. It may be valuable to recognize similarities between a first junior developer interaction and a first LLM interaction, but when every LLM interaction requires it to be handheld, the value of the iterative nature of having a junior developer work along side you is not at all equivalent.
wrs 17 hours ago [-]
I didn’t say they are equivalent, nor do I in any way consider them equivalent. One is a tool, the other is a person.
I simply said the description of the problem should be broken down similar to the way you’d do it for a junior developer. As opposed to the way you’d express the problem to a more senior developer who can be trusted to figure out the right way to do it at a higher level.
maccard 17 hours ago [-]
> I have noticed some people I work with have more trouble, and my vague intuition is it happens when they give Claude too much autonomy
What’s giving too much autonomy about
“Please load settings.toml using a library and print out the name key from the application table”? Even if it’s under specified, surely it should at least leave it _compiling_?
I’ve been posting comments like this monthly here, my experience has been consistently this with Claude, opencode, antigravity, cursor, and using gpt/opus/sonnet/gemini models (latest at time of testing). This morning was opus 4.6
linsomniac 16 hours ago [-]
> Even if it’s under specified, surely it should at least leave it _compiling_?
Are you using Claude Code? Do yo have it configured so that you are not allowing it to run the build? Because I've observed that Claude Code is extremely good at making sure the code compiles, because it'll run a compile and address any compile errors as part of the work.
I just asked it to build a TOML example program in DotNet using Tomlyn, and when it was done I was able to run "./bin/Debug/net8.0/dotnettoml example.toml", it had already built it for me (I watched it run the build step as part of its work, as I mentioned it would do above).
maccard 16 hours ago [-]
I am using Claude code. I didn’t explicitly tell it what the build command was (it’s dotnet build), and it didn’t ask. Thats not my fault.
> I’ve observed Claude code is extremely good at making sure the code compiles
My observation is that it’s fine until it’s absolutely not, and the agentic loop fails.
linsomniac 15 hours ago [-]
>Thats not my fault.
I don't know that it's useful to assign blame here.
It probably is to your benefit, if you are a coding professional, to understand why your results are so drastically different from what others are seeing. You started this thread saying "I keep getting told I'll be amazed at what it can do, but the tools keep failing at the first hurdle."
I'm telling you that something is wrong, that is why you are getting poor results. I don't know what is wrong, but I've given you an example prompt and an example output showing that Claude Code is able to produce the exact output you were looking for. This is why a lot of people are saying "you'll be amazed at what it can do", and it points to you having some issue.
I don't know if you are running an ancient version of Claude Code, if you are not using Opus 4.6, you are not using "high" effort (those are what I'm using to get the results I posted elsewhere in reply to your comment), but something is definitely wrong. Some of what may be wrong is that you don't have enough experience with the tooling, which I'd understand if you are getting poor results; you have little (immediate) incentive to get more proficient.
As I said, I was able to tell Claude Code to do something like the example you gave, and it did it and it built, without me asking, and produced a working program on the first try.
maccard 12 hours ago [-]
> I don’t know that it’s useful to assign blame here
Oh - I’m blaming Claude not anyone else. I’ve tried again this evening and the same prompt (in the same directory on the same project) worked.
> i don’t know if you’re using an ancient version of Claude code,
I’m on a version from some time last week, and using opus 4.6
> This is why a lot of people are saying "you'll be amazed at what it can do", and it points to you having some issue.
If you look at my comments in these threads, I’ve had these issues and been posting about this for months. I’m still being told “ you’re using the wrong model or the wrong tool or you’re holding it wrong” but yet, here I am.
I’m using plan mode, clearly breaking down tasks and this happens to me basically every time I use the damn tool. Speaking to my team at work and friends in other workplaces, I hear the same thing. But yet we’re just using it wrong or doing something wrong,
Honestly, I genuinely think the people who are not having these experiences just… don’t notice that they are.
Kiro 3 hours ago [-]
I understand that you think you are arguing that the models are bad, but the only thing people wonder is what you're doing to fail so spectacularly and whether you're actually being truthful.
maccard 36 minutes ago [-]
And I’m wondering the same thing. The people replying to me are saying “your experience must be wrong/you must be doing something wrong” and then 1-2 threads later they say “well yeah it doesn’t do X but if I do Y and Z it works”, which… kind of proves the point?
wrs 8 hours ago [-]
I assure you I would have noticed if the result of my Friday effort was something that didn't compile, rather than a service that seems to work just fine. I've reviewed about half the code so far and it seems quite reasonable.
For some reason I'm getting downvoted for trying to help, but regardless, if you can (and want to) post a transcript somewhere of some of these sessions that aren't working out, maybe some of us who are having more success can take a look and see what's up.
Fresh prompt in a codebase with a claude.md I asked it to do a simple task. I've shared the prompt, plan, output in the linked gist.
Kiro 16 hours ago [-]
Not even the worst possible prompt would explain your unusual experience, so I don't think that's it either.
wrs 16 hours ago [-]
There’s nothing wrong with it that I can see. Like I said, I’m a bit baffled at your experience. I will say, it’s not unusual for the initial output not to compile, but usually one short iteration later that’s fixed. Claude Code will usually even do that iteration by itself.
maccard 16 hours ago [-]
> I will say, it’s not unusual for the initial output not to compile,
We’ve gone from “I’m baffled at your experience” to well yeah it often fails” in two sentences here…
wrs 13 hours ago [-]
Hmm…if you’re giving only one prompt to Claude Code, and allowing it only one output, then I’m no longer baffled at why you’re not getting good results. That’s not how it works. (That’s not how it works when I write code myself, either!)
maccard 12 hours ago [-]
I mean, I don’t know how much less scope I can give it. The next step is writing the 5 lines of code I want it to write.
I also clearly said I didn’t allow it one output, I gave it the compile error message, it changed a different line, I told it it was at the affected line and to check the docs. Claude code then tried to query the DLL for the function, abandoned that and then did something else incorrect.
I’m literally asking it to install a package and copy the example from the readme
wrs 12 hours ago [-]
I understand (and don’t know what to say beyond “that’s weird”). I was just reacting to your equating “the first try sometimes doesn’t compile” with “it often fails”. That’s only a failure if you stop at that point.
maccard 3 hours ago [-]
In the first comment you’ll see that it didn’t compile, I prompted to fix it and eventually fixed it myself. When I asked it to proceed to the next task it blew away my fix and left it in a broken state.
saulpw 13 hours ago [-]
It's not unusual for my initial output (as a programmer) not to compile either. I wouldn't say I "failed" if I can then get it to compile. Which as people are saying, is what happens with Claude Code and Opus, either automatically or at most when I say "get it to compile".
maccard 12 hours ago [-]
But when it doesn’t compile for me, I don’t claim it’s finished.
shireboy 18 hours ago [-]
Similar. I regularly use Github copilot (with claude models sometimes) and it works amazingly. But I see some who struggle with them. I have sort of learned to talk to it, understand what it is generating, and routinely use to generate fixes, whole features, etc. much much faster than I could before.
maccard 3 hours ago [-]
I’ve found copilot to generate worse code, and the editor integrations to be unusable (lets insert this block of code at the wrong place in the file for example), but overall for it to do the right thing enough that I can patch it. I’ve found Claude code just goes mad and gets stuck in a loop and I need to pull it out and figure out where in the hundreds of lines of code it’s generates in the last few minutes it’s gone so wrong.
HoyaSaxa 19 hours ago [-]
I think most public companies will take the short term profits and startups will be given a huge opportunity to take market share as a result.
At my company, we are maintaining our hiring plan (I'm the decision maker). We have never been more excited at our permission to win against the incumbents in our market. At the same time, I've never been more concerned about other startups giving us a real run. I think we will see a bit of an arms race for the best talent as a result.
Productivity without clear vision, strategy and user feedback loops is meaningless. But those startups that are able to harness the productivity gains to deliver more complete and polished solutions that solve real problems for their users will be unstoppable.
We've always seen big gains by taking a team of say 8 and splitting it into 2 teams of 4. I think the major difference is that now we will probably split teams of 4 into 2 teams of 2 with clearer remits. I don't want them to necessarily delivery more features. But I do want them to deliver features with far fewer caveats at a higher quality and then iterate more on those.
Humans that consume the software will become the bottlenecks of change!
18 hours ago [-]
oro44 12 hours ago [-]
"But those startups that are able to harness the productivity gains to deliver more complete and polished solutions that solve real problems for their users will be unstoppable."
They'd be unstoppable irrespective of LLMs. Why do you think Zuckerberg acquired Instagram? He literally tried copying it and failed. Instagram at the time was absolutely tiny in terms of pure labour, relative to Facebook.
Most people on hacker news are missing the point. Productivity gains for the sake of perceived productivity gains is not what creates economic value. Its not the equivalent of a factory all of a sudden becoming more productive in producing more of the same stuff. Not comparable at all.
rsynnott 22 hours ago [-]
> boilerplate
Ruby on Rails and its imitators blew away tons of boilerplate. Despite some hype at the time about a productivity revolution, it didn’t _really_ change that much.
> , libraries, build-tools,
Ensure what you mean by this; what bearing do our friends the magic robots have on these?
> and refactoring
Again, IntelliJ did not really cause a productivity revolution by making refactoring trivial about 20 years ago. Also, refactoring is kind of a solved problem, due to IntelliJ et al; what’s an LLM getting you there that decent deterministic tooling doesn’t?
rileymichael 19 hours ago [-]
couldn't have said it better. all of the people clamoring on about eliminating the boilerplate they've been writing + enabling refactoring have had their heads in the sand for the past two decades. so yeah, i'm sure it does seem revolutionary to them!
maccard 19 hours ago [-]
There have been a handful of leaps - copilot was able to look at open files and stub out a new service in my custom framework, including adding tests. It’s not a multiplier but it certainly helps
rileymichael 19 hours ago [-]
most frameworks have CLIs / IDE plugins that do the same (plus models, database integration, etc.) deterministically. i've built many in house versions for internal frameworks over the years. if you were writing a ton of boilerplate prior to LLMs, that was on you
maccard 19 hours ago [-]
Habe they? I’ve used tools that mostly do it, but they require manually writing templates for the frameworks. In internal apps my experience has been these get left behind as the service implementations change, and it ends up with “copy your favourite service that you know works”.
rileymichael 18 hours ago [-]
> they require manually writing templates for the frameworks
the ones i've used come with defaults that you can then customize. here are some of the better ones:
> my experience has been these get left behind as the service implementations change
yeah i've definitely seen this, ultimately it comes down to your culture / ensuring time is invested in devex. an approach that helps avoid drift is generating directly from an _actual_ project instead of using something like yeoman, but that's quite involved
maccard 17 hours ago [-]
Sorry - I’m aware that rails/dotnet have these built into visual studio and co, but my point was about our custom internal things that are definitely not IDE integrated.
> it comes down to ensuring time is invested in devex
That’s actually my point - the orgs haven’t invested in devex buyt that didn’t matter because copilot could figure out what to do!
epicureanideal 19 hours ago [-]
And for a lot of AI transformation tasks, for a long time I've been using even clever regex search/replace, and with a few minutes of small adjustment afterward I have a 100% deterministic (or 95% deterministic and 5% manually human reviewed and edited) process for transforming code. Although of course I haven't tried that cross-language, etc.
And of course, we didn't see a massive layoff after the introduction of say, StackOverflow, or DreamWeaver, or jQuery vs raw JS, Twitter Bootstrap, etc.
elvis10ten 19 hours ago [-]
I just had a relevant experience. I asked Claude to add “trace(‘$FunctionName’) {}” to all Composable functions in my app. Claude spent some time doing something. In between, I was like shoot, I could just do a deterministic regex match and replace.
rileymichael 19 hours ago [-]
structural search and replace in intellij is a superpower (within a single repo).
for polyrepo setups, openrewrite is great. add in an orchestrator (simple enough to build one like sourcegraph's batch changes) and you can manage hundreds of repositories in a deterministic, testable way.
jmull 17 hours ago [-]
> or the boilerplate, libraries, build-tools, and refactoring
If your dev group is spending 90% of their time on these... well, you'd probably be right to fire someone. Not most of the developers but whoever put in place a system where so much time is spent on overhead/retrograde activities.
Something that's getting lost in the new, low cost of generating code is that code is a burden, not an asset. There's an ongoing maintenance and complexity cost. LLMs lower maintenance cost, but if you're generating 10x code you aren't getting ahead. Meanwhile, the cost of unmanaged complexity goes up exponentially. LLMs or no, you hit a wall if you don't manage it well.
linsomniac 16 hours ago [-]
>There's an ongoing maintenance and complexity cost.
My company has 20 years of accumulated tech debt, and the LLMs have been pretty amazing at helping us dig out from under a lot of that.
You make valid points, but I'm just going to chime in with adding code is not the only thing that these tools are good at.
MathMonkeyMan 8 hours ago [-]
> Not most of the developers but whoever put in place a system where so much time is spent on overhead/retrograde activities.
Dude that's everybody in charge. You're young, you build a system, you mold it to shifting customer needs, you strike gold, you assume greater responsibility.
You hire lots of people. A few years go by. Why can't these people get shit done? When I was in their shoes, I was flying, I did what was needed.
Maybe we hired second rate talent. Maybe they're slacking. Maybe we should lay them off, or squeeze them harder, or pray AI will magically improve the outcome.
Come on, look in the mirror.
bluefirebrand 12 hours ago [-]
> LLMs lower maintenance cost
Even the most enthusiastic AI boosters I've read don't seem to agree with this. From what I can tell, LLMs are still mostly useful at building in greenfield projects, and they are weak at maintaining brownfield projects
helge9210 4 hours ago [-]
From my experience greenfield /brownfield is not the best dichotomy here. I observed, how same tooling is generating meaningless slop on greenfield project and 10kLoC of change (leading to an outage) on existing project in one hands and building a fairly complex new project and fixing a long (years) standing bug with a two lines patch in the other.
And I have more examples, where I, personally, was on the both sides of the fence: defined by my level of the same problem understanding, not by the tooling.
animal531 21 hours ago [-]
I use it near daily and there is definitely a positive there, BUT its nothing like what the OP statement would make it up to be.
If it is writing both the code and the tests then you're going to find that its tests are remarkable, they just work. At least until you deploy to a live state and start testing for yourself, then you'll notice that its mostly only testing the exact code that it wrote, its not confrontational or trying to find errors and it already assumes that its going to work. It won't ever come up with the majority of breaking cases that a developer will by itself, you will need to guide it. Also while fixing those the odds of introducing other breaking changes are decent, and after enough prompts you are going to lose coherency no matter what you do.
It definitely makes a lot of boilerplate code easier, but what you don't notice is that its just moving the difficult to find problems into hidden new areas. That fancy code that it wrote maybe doesn't take any building blocks, lower levels such as database optimization etc. into account. Even for a simple application a half-decent developer can create something that will run quite a bit faster. If you start bringing these problems to it then it might be able to optimize them, but the amount of time that's going to take is non-negligible.
It takes developers time to sit on code, learn it along with the problem space and how to tie them together effectively. If you take that away there is no learning, you're just the monkey copy-pasting the produced output from the black box and hoping that you get a result that works. Even worse is that every step you take doesn't bring you any closer to the solution, its pretty much random.
So what is it good for? It can both read, "understand", translate, write and explain things to a sufficient degree much faster than us humans. But if you are (at the moment) trusting it at anything past the method level for code then you're just shooting yourself in the foot, you're just not feeling the pain until later. In a day you can have it generate for example a whole website, backend, db etc. for your new business idea but that's not a "product", it might as well be a promotional video that you throw away once you've used it to impress the investors. For now that might still work, but people are already catching on and beginning to wise up.
intoXbox 14 hours ago [-]
The pain already starts when a new feature needs to be introduced, your colleague is assigned to the task and the architecture is completely unfit for modular development.
Any sensible experienced programmer will take into account possible future features spoken about around the office, when deciding how implement a feature. To me that’s the key reason I can’t delegate AI complex features that will grow. The same reason I inspect and clean the toilet when I’m done - out of respect to my colleagues
capitalsigma 13 hours ago [-]
This is more or less my belief too. But I think there are probably domains/companies where the "promo video" is actually good enough to ship (or at least, where that's the quality that devs were shipping before)
13 hours ago [-]
kakacik 18 hours ago [-]
Don't break the gravy train! These discussions here are beautiful echo chambers.
I feel like most folks commenting uncritically here about second coming of Jesus must work in some code sweatshops, churning eshops or whatever is in vogue today quickly and moving on, never looking back, never maintaining and working on their codebases for decade+. Where I existed my whole career, speed of delivery was never the primary concern, quality of delivery (which everybody unanimously complaints one way or another with llms) was much more critical and thats where the money went. Maybe I just picked up right businesses, but then again I worked for energy company, insurance, telco, government, army, municipality, 2 banks and so on across 3 European states. Also, full working code delivery to production is a rather small part of any serious project, even 5x speed increase would move the needle just a tiny bit overall.
If I would be more junior, I would feel massive FOMO from reading all this (since I use it so far just as a quicker google/stackoverflow and some simpler refactoring, but oh boy is is sometimes hilariously and dangerously wrong). I am not, thus I couldn't care less. Node.js craze, 10x stronger, not seeing forest for the trees.
anon7725 19 hours ago [-]
This is the most insightful comment in the thread.
sdn90 2 hours ago [-]
I think we are in a price discovery phase overall for human labor.
Just like financial markets, this phase is very volatile. But today’s price can look much different in 1, 2, 5, and 10 years.
Right now there’s a lot of opportunity to disrupt companies who are moving much slower since adoption can vary greatly.
But I think in a few years this meta will be overcrowded. The overall skill/productivity gap across companies will be reduced and the bar for productivity will be raised.
There’s room for taking profits now from the productivity gains but I don’t think it will last long.
If AI is really increasing productivity enough to reduce headcount right now, it won’t be in future. If all your competitors are using AI as effectively as you are, can you still do this?
The world’s demand for productivity is limitless. As of right now you still need someone to at least install Claude Code and run the binary.
Areena_28 5 hours ago [-]
We went through this exact decision point 18 months ago. But we chose to keep the team. Redirected them. Our threat detection pipeline, which would've taken 8 months to build - surprisingly and luckily shipped in 11 weeks. That's not a productivity stat, that's a product moat.
The companies gutting engineering right now are going to realise too late that AI amplifies what you already have. If you fire the people with institutional knowledge and domain expertise, you're left with an AI that generates confident, well-formatted mediocrity. Agree or not?
theshrike79 5 hours ago [-]
I've been trying to figure out a formula how to describe the gains AI use gives.
It's not a direct multiplier, because even non-programmers can get stuff done with it just from an idea.
And it's not exactly an exponent either, even though experienced programmers can get absolutely massive gains when they learn to worth with it.
"Replace tools, not jobs" is the best AI slogan I've found yet. Use the VC funded practically free massive language models to quickly write every single small utility, tool, connector and dashboard you've ever wanted.
You know, the ones you've always wanted to do, but could never justify the time spent on them compared to the utility. Specifically in corporate environments. Stuff that used to require a project manager, programmer and a QA person dedicated to some tiny app for a week or two can now be done in two days by a single person with zero coding experience and an idea to a degree where you can be evaluate whether it's worth continuing or not.
aurareturn 1 days ago [-]
In certain industries, increasing productivity by 90% does not mean 90% increase in profit. This is because growth depends on market TAM and growth rate.
Another way of increasing profit is to simply reduce your headcount by 90% while keeping the same profit.*
Hence, I think some companies will keep downsizing. Some companies will hire. It depends a lot.
*Assuming 90% productivity increase.
muzani 23 hours ago [-]
In companies like the oil industry, doubling productivity would mean reducing the expected life span. Costs tend to catch up with profits, especially due to taxes. Layoffs happen inevitably.
Is it the same with tech? Facebook has 3 billion monthly active users. No amount of tech will bring that up to 6 billion. If you were to double the amount of time someone spends on Facebook, or double the ads they see or double the click through rate, what does that really mean?
hirako2000 19 hours ago [-]
That's assuming a company is constrained, to not expand its portfolio.
Taking the example of Facebook. They are in social media, messaging, AI, VR/AR hardware and software, a few other things, meta universe whatever that was, now left with the name. Facebook isn't delivering or successful on all its ventures, it knows that, it keeps investing in other segments.
More productivity would mean at least diversifying, they have some of the best engineers, it would make no sense to not simply attempt to hit the jackpot by playing more machines.
What fewer people talks about is that the entire tech industry is tertiary services. Ads, entertainment, communication, etc. If/when hard industries take a hit, tertiary takes a hit. If it isn't clear to you that the overall economy has already started to take some irreversible dents, and that those will accelerate, know that the capital is well aware.
Or we can continue wishful thinking and seek comfort that monetary tightening is just temporary, investments will flow more into tangent ventures and growth is around the corner, the U.S still is and will remain the world's strongest economy.
muzani 1 hours ago [-]
There's diminishing gains to some features too. Facebook added Dating but according to a friend who worked on it, the increased privacy restrictions developed in parallel elsewhere killed the algorithm. It died on launch, and wasn't even their fault.
Even if it succeeded, having more buttons everywhere - marketplace, reels, stories, blogs, articles, groups, etc. A lot of this make the whole experience more crowded.
Productivity here has an odd effect, sort of like larger roads causing more traffic jams.
aurareturn 19 hours ago [-]
Yeah, even if Meta engineers gain 90% increase in productivity, I doubt Meta can increase revenue by 90% more than previous environment. There’s just a cap to how much time people want to spend on social media.
I think most companies are making the right call by downsizing instead of staying same size. Let people go to where there is more potential for growth.
matt_s 23 hours ago [-]
Developers are going to be more productive, just not how you think. If history is going to rhyme, then the software industry will enter into a self-serving productivity craze building all sorts of software tooling, frameworks, ralph wiggum loop variants, MCPs, etc. much like the surge in JS frameworks and variants in the past. Most of those things will not have any business value. Software devs, myself included, love to do things "because I can" and not necessarily because they should.
Smart organizations will not just deliver better products but likely start products that they were hesitant to start before because the cost of starting is a lot closer to zero. Smart engineering leadership will encourage developers into delivering value and not self-serving, endless iterations of tooling enhancements, etc.
If I was a CTO and my competitor Y fired 90% of their devs, I'd try to secure funding to hire their top talent and retain them. The vitriol alone could fuel some interesting creations and when competitor Y realizes things later, their top talent will have moved on.
MichaelRo 22 hours ago [-]
>> then the software industry will enter into a self-serving productivity craze building all sorts of software tooling, frameworks
>> Smart organizations will not just deliver better products but likely start products [...]
This is not the 90s anymore when low hanging fruit was everywhere ready to be picked. We have everything under the sun now and more.
The problem with bullshit apps is not that it took you 5 months to build. What you build now in 5 minutes it's still bullshit. Most of the remaining work is bullshit jobs. Spinning useless "features" and frameworks that nobody needs and shove them down the throat of customers that never asked for them. Now it's possible to dig holes and fill them back (do pointless work) at much improved pace thanks to AI.
ngburke 17 hours ago [-]
Solo here, no devs to fire. What's changed is the bar for "worth building" dropped massively. Stuff I'd have shelved as too small to justify the time, I just do now. The risk is that you end up with a lot of half-finished things if you're not disciplined about finishing.
theshrike79 5 hours ago [-]
People tend to forget that big businesses have been bootstrapped with worse stuff than some MVP vibe code website.
I have a vague memory of a hotel reservation "app" that was just putting lines in a Google Sheet where the founders grabbed the info, called the hotels themselves and reserved the room.
When they got too many requests to keep up, they slowly automated the bits they could one by one.
lateforwork 19 hours ago [-]
You still need humans to manage the whole lifecycle, including monitoring the live site, being on-call, handling incidents, triaging bugs, deploying fixes, supporting users and so on.
For greenfield development you don't need as many software engineers. Some developers (the top 10%) are still needed to guide AI and make architectural decisions, but the remaining 90% will work on the lifecycle management task mentioned above.
The productivity gains can be used to produce more software, and if you are able to sell the software you produce should result in a revenue boost. But if you produce more than you can sell then some people will be laid off.
ramon156 15 hours ago [-]
If your dev team can be replaced by AI bots, go ahead and do it. Be confident about it.
Seriously, all managers I know (2, somehow) that have done this are now in sprint hell because they didn't realize an LLM cannot do the work their devs did. Bit of schadenfreude.
Not saying this can't be done, some projects are easy enough that you can throw an LLM at it
al_borland 15 hours ago [-]
It's better than it was, but I'm still not amazed.
I was working with it just yesterday and found it mixed up class attributes and dict keys when looking if something existed. As a result, it always thought the key was missing. Glancing through the code, it would have been easy to overlook, as it was checking for something... just doing it the wrong way. The code also appeared to still work in the larger context, just not as well or as efficiently as it was supposed to work. It was only my obsessive testing and attention to detail that caught it.
I find it much faster and easier to write things myself, using AI to help me along the way, rather than trying to delegate everything to the AI and hoping for the best. Writing code is much more interesting than reviewing code, and it puts me close enough to it that those kind of issues don't happen in the first place.
alexsmirnov 9 hours ago [-]
> you start off checking every diff like a hawk, expecting it to break things, but honestly, soon you see it's not necessary most of the time.
I see it's necessary ALL the time. The AI generated code can be used as scaffolding, but it's newer get close to real production quality. The expierence from small startup with team of 5 developers. I do review and approve all PRs, and none ever able to pass AI code review from the first iteration.
ppqqrr 17 hours ago [-]
false dichotomy. neither company will make it. the winner will be some solo dev with a singular vision and ruthless dedication to quality. “teams” do not have visions, individuals do. the more people are involved in a product, the blurrier the vision becomes, the more insidious the vested interest of the organization to profit from the user, rather than align themselves with the user. solo devs do not have such problems, because they’re no different from users, aside from the fact that they were users before the software existed. they make the software because they want to use it, and there is no amount of payroll employees that can replicate the quality and innovation that results from this simple, genuine self-interest.
delbronski 14 hours ago [-]
I hope you are right. I’ve been in this industry 20 years… the passionate developer or small team with the awesome software wins a few times, but 99% of the time it’s the shitty software with a great marketing and sales team that wins.
ritzaco 17 hours ago [-]
How much software do you need and how many computers are there to run it on?
After combine harvester, we produced the same food with less people.
At the moment, it seems like hardware is the constraint. Companies don't have access to enough machines or tokens to keep all their devs occupied, so they let some go. Maybe that changes, maybe we already have too much software?
Personally I think we already had too much software before LLMs and even without them many devs would have found themselves jobless as startups selling to startups selling to startups failed and we realized (again) that food, shelter, security, education etc are 'real' industries, but software isn't one if it's not actively helping one of those.
ekkeke 17 hours ago [-]
There definitely isn't enough _specialised_ software. If you look at engineering tools the open source/free stuff is just not good, and the professional software packages run in the tens of thousands per user. I'm eagerly awaiting the day we get competitive open source alternatives to simulink, HFSS, autocad, solidworks, LT Spice, etc.
Unfortunately this kind of software needs specialised domain knowledge to produce that AI doesn't have yet, but when (if) it arrives I hope we see strides forwards in hardware engineering productivity.
bryant 17 hours ago [-]
Public companies are incentivized to fire for short term gains while figuring out long term strategy on the basis that they'll have a cheaper pool to hire from once they figure out how they're going to more effectively monetize their ability to scale with AI.
Companies without the same constraints are well equipped to keep who they've got, pivot them into managing/overseeing agents to scale, and build better products from the outset.
So this'll be a good opportunity for smaller companies (or not-for-profits like co-ops and credit unions) to eat the lunches of bigger companies that'll be slow to adapt.
rglover 17 hours ago [-]
I'm a soloist. My plants are getting watered again and I get to finish work earlier and enjoy my home life now instead of sprouting ulcers or staying up til 4am. In the event I did/do have employees, I wouldn't hire anyone who could be replaced by an LLM (nor would I want to because I enjoy keeping my clients happy and having a team that actually feels like they have some sense of security in life—turns out those people are incredibly loyal and hard working).
conartist6 23 hours ago [-]
Gotta fire everyone, or else "too many cooks" will mean that even those temporary productivity gains go up in smoke.
Remember sometimes the most productive thing to have is not money or people but time with your ideas.
conartist6 23 hours ago [-]
What's so sad to see is people excited about making something selling away the time they would have with their ideas. They're paying money to not use their brain to contemplate the thing they should be the foremost expert in the world on, and they're excited to be paying more and more for each modicum of ignorance and mediocrity dispensed.
23 hours ago [-]
hexer303 17 hours ago [-]
I find that the arguments of whether or not AI boosts productivity is not very productive.
The more grounded reality is that AI coding can be a productivity multiplier in the right hands, and a significant hindrance in the wrong hands.
Somewhere there exists a happy medium between vibe coding without ever looking at the code, and hand-writing every single line.
dare944 22 hours ago [-]
False dichotomy. If your company is at the point of firing lots of people "to save a buck" its way past the point of caring about delivering a better product.
18 hours ago [-]
jaen 23 hours ago [-]
That's assuming every developer can get the same AI efficiency boost and contribute meaningfully to any feature, which is unfortunately not really the case.
Seniors can adjust, but eg. junior frontend-only devs might be doomed in both situations, as they might not be able to contribute enough to business-critical features to justify their costs and most frontend-related tasks will be taken over by the "10x" seniors.
d--b 2 hours ago [-]
Definitely better products.
The main effect is that you build things a lot faster.
The secondary effect is that, since you can code a lot faster, you are no longer scared of large refactors, ports to other languages, and all that kind of tedious stuff.
That means that if you design something, ship it, and realize it's bad, you can completely change it.
The "oh we should have done it differently, but now it would take years to change, so we have to deal with legacy garbage" argument does really make sense anymore.
Like if HN had LLM agents, we wouldn't have had to wait for years for them to solve the "click here for more comments" problem.
And iterating with users is a lot faster.
giantg2 23 hours ago [-]
I feel like the ideas have always been the tough part. Finding novel ideas with a good return is extremely tough.
hirako2000 19 hours ago [-]
Assuming you are primarily selling software.
Situation a/ llm increase developer's productivity: you hire more developers as you cash profit. If you don't your competitor will.
b/ llm doesn't increase productivity, you keep cruising. You rejoice seeing some competitors lay off.
Reality shows dissonance with these only possible scenarios. Absurd decision making, a mistake? No mistake. Many tech companies are facing difficulties, they need to lose weight to remain profitable, and appease the shareholders demand for bigger margins.
How to do this without a backlash? Ai is replacing developers, Anthropic's CEO said engineers don't write code anymore, role obsolete in 6 months. It naturally makes sense we have to let some of them go. If the prophecy doesn't turn true, nobody ever get fired for buying IBM.
gamblor956 19 hours ago [-]
AI tooling does not provide productivity gains unless you consider it productive to skip the boilerplate portion of software development, which you can already do by using a framework, or you never plan to get past the MVP stage of a product, as refactoring the AI spaghetti would take several magnitudes more work than doing it with humans from the beginning.
Amazon has demonstrated that it takes just as longer, or longer, to have senior devs review LLM output than it would to just have the senior devs do the programming in the first place. But now your senior devs are wasted on reviewing instead of developing or engineering. Amazon, Microsoft, Google, Salesforce, and Palantir have all suffered multiple losses in the tens of millions (or more) due to AI output issues. Now that Microsoft has finally realized how bad LLMs really are at generating useful output, they've begun removing AI functionality from Windows.
Product quality matters more than time to market. Especially in tech, the first-to-market is almost never the company that dominates, so it's truly bizarre that VCs are always so focused on their investments trying to be first to market instead of best to market.
If Competitor Y just fired 90% of their developers, I would have a toast with my entire human team. And a few months later, we'd own the market with our superior product.
bendmorris 19 hours ago [-]
It's disappointing that this is clearly being downvoted due to disagreement - it's a valid perspective. We have very little evidence of the overall impact of aggressively generating code "in the wild" and plenty of bad examples. No one knows what this ends up looking like as it continues to meet reality but plenty are taking a large productivity improvement as a given.
marcyb5st 22 hours ago [-]
If I were to run the company potentially mostly focus on better products with the exception of firing those that don't adopt the technology.
If it is a big company the answer is and will always be: whatever makes the stock price rise the most.
elvis10ten 19 hours ago [-]
> with the exception of firing those that don't adopt the technology.
This is a crazy take. Even if said people are matching or exceeding the outcome of those using the technology?
I’m not in this group. But the closest analog to what you are saying is firing people for not using a specific IDE.
ashwinnair99 19 hours ago [-]
The companies quietly doing the firing will say they're doing the building.
The answer you get depends entirely on who you ask and what they're trying to justify.
GenerWork 22 hours ago [-]
>do you fire devs or build better products?
I think that it's more along the lines of "do you fire people" instead of just "do you fire devs". Fewer devs means less of a need for PMs, so they can be let go as well, and maybe with the rise of AI assisted design tools, you don't need as many UX people, so you let some of them go as well.
As for building better products, I feel like that's a completely different topic than using AI for productivity gains, but only because at the end of the day you need buy in from upper management in order to build the features/redo existing features/both that will make the product better. I should also mention I'm viewing this from the position of someone who works at an established company and not a startup, so it may differ.
scuderiaseb 17 hours ago [-]
I have found it very useful in pointing me in the right direction, exploring codebases and writing up boiler plate or other scaffolding. However I do need to review and test, and at the end of the day it’s me who’s responsible for the code. So I only make it write parts of code at a time, not oneshotting a new Github company.
stephen_cagle 18 hours ago [-]
> you start off checking every diff like a hawk, expecting it to break things, but honestly, soon you see it's not necessary most of the time.
My own experience...
I've tried approaching vibe coding in at least 3 different ways. At first I wrote a system that had specs (markdown files) where there is a 1 to 1 mapping between each spec to a matching python module. I only ever edited the spec, treating the code itself as an opaque thing that I ignore (though defined the intrefaces for). It kind of worked, though I realized how distinct the difference between a spec that communicates intent and a spec that specifies detail really is.
From this, I felt that maybe I need to stay closer to the code, but just use the LLM as a bicycle of the mind. So I tried "write the code itself, and integrate an LLM into emacs so that you can have a discussion with the LLM about individual code, but you use it for criticism and guidance, not to actually generate code". It also worked (though I never wrote anything more then small snippets of Elisp with it). I learned more doing things this way, though I have the nagging suspicion that I was actually moving slower than I theoretically could have. I think this is another valid way.
I'm currently experimenting with a 100% vibe coded project (https://boltread.com). I mostly just drive it through interaction on the terminal, with "specs" that kind of just act as intent (not specifications). I find the temptation to get out of the outside critic mode and into just looking at the code is quite strong. I have resisted it to date (I want to experiment with what it feels like to be a vibe coder who cannot program), to judge if I realistically need to be concerned about it. Just like LLM generated things in general, the project seems to get closer and closer to what I want, but it is like shaping mud, you can put detail into something, but it won't stay that way over time; its sharp detail will be reduced to smooth curves as you then switch to putting detail elsewhere. I am not 100% sure on how to deal with that issue.
My current thoughts is that we have failed to actually find a good way of switching from the "macro" (vibbed) to the "micro" (hand coded) view of LLM development. It's almost like we need modules (blast chambers?) for different parts of any software project. Where we can switch to doing things by hand (or at least with more intent) when necessary, and doing things by vibe when not. Striking the balance between those things that nets the greater output is quite challenging, and it may not even be that there is an optimal intersection, but simply that you are exchanging immediate change for future flexibility to the software?
_wire_ 1 days ago [-]
"I just got my third coffee and I'm feeling really good about the quality of this code. I don't even bother to look at it. Keeps my tests simple to not test at all. You know, 'in theory'. Of course when the architecture gets genuinely tangled... Just keep your IDE open so the code knows where to go. Whoa, too much caffeine, super sleepy..."
Terafab is suddenly making so much sense!
Throaway1975123 22 hours ago [-]
"ugh, what a hard day at work, I had to prompt the LLM for 8 hours!!"
lordkrandel 1 days ago [-]
Why do people keep ralking about AI as it actually worked? I still don't see ANY proof that it doesn't generate a total unmaintainable unsecure mess, that since you didn't develop, you don't know how to fix. Like running a F1 Ferrari on a countryside road: useless and dangerous
tyleo 24 hours ago [-]
Because it's working for a lot of people. There are people getting value from these products right now. I'm getting value myself and I know several other folks at work who are getting value.
I'm not sure what your circumstances are but even if it's not true for you, it's true for many other people.
pydry 23 hours ago [-]
It's interesting that the people IRL I encounter who "get the most value" tend to be the devs who couldnt distinguish well written code from slop in the first place.
People online with identical views to them all assure me that theyre all highly skilled though.
Meanwhile I've been experimenting using AI for shopping and all of them so far are horrendous. Cant handle basic queries without tripping over themselves.
hermannj314 19 hours ago [-]
If you are a 2600 chess player, a bot that plays 1800 chess is a horrendous chess player.
But you can understand why all the 1700 and below chess players say it is good and it is making them better using it for eval?
Don't worry, AI will replace you one day, you are just smarter than most of us so you don't see it yet.
roncesvalles 18 hours ago [-]
The analogy of thinking of coding AI like it's chess AI is terrible. If chess AI was at the level of coding AI, it wouldn't win a single game.
This kind of thinking is actually a big reason why execs are being misinformed into overestimating LLM abilities.
LLM coding agents alone are not good enough to replace any single developer. They only make a developer x% faster. That dev who is now x% faster may then allow you to lay off another dev. That is a subtle yet critical difference.
lugu 14 hours ago [-]
I like the chess analogy as it answer the question: why can't i see those gain?
To adress your point, let's try another one analogy. Imagine secreterial assistants, discussing their risk of been replaced by computers in the 80s. They would think: someone still need to type those letters, sit next to that phone and make those appointments, I am safe. Computers won't replace me.
It is not that AI will do all of your tasks and replace you. It is that your role as a specialist in software development won't be necessary most of the time (someone will do that, and that person won't call themselves a programmer).
roncesvalles 14 hours ago [-]
Secretarial assistant as a profession is still very alive, and the title has been inflated to stratospheric heights (and compensation): "Chief of Staff"
pydry 28 minutes ago [-]
I imagine if they tried to replace typists with keyboards that produced plausible looking words that were entirely wrong half the time then we'd probably still have plenty of typists.
I tend to find that the volume of automation predictions inversely correlates to how real they are.
When capitalists actually have the automation tech they dont shout about it they just do it quietly and collect the profits.
When, say, Bezos is worried about his unionizing workforce and wants to intimidate - that's when the hot takes and splashy media articles about billions invested in automation "coming for yer jerb" you read about get published.
AStrangeMorrow 21 hours ago [-]
Idk basically everyone is my org has seen some good value out of it. We have people complaining about limitations, but would still rather have that tooling than not.
For me the main difference is now some people can explain what their code does. While some other only what it wants to achieve
tyleo 22 hours ago [-]
> I've been experimenting using AI for shopping
This is an interesting choice for a first experiment. I wouldn't personally base AI's utility for all other things on its utility for shopping.
pydry 22 hours ago [-]
It's not a first experiment it's experiment 50 or 60 and, a reaction to AI gaslighting.
Most people dont really understand coding but shopping is a far simpler task and so it's easier to see how and where it fails (i.e. with even mildly complex instructions).
tyleo 22 hours ago [-]
Do you mind sharing examples of the prompts you are using?
oro44 12 hours ago [-]
Agreed.
Im very confident the experts in every field are not all that impressed by LLMs, relative of course, to those who were 'meh' in the first place. Experts meaning those who actually understand the content, not simply regurgitate or repeat it on demand.
I'd even go as far as to say there are many out there who have a feeling of disdain of the experts and want to see LLMs flourish because of this.
giantg2 23 hours ago [-]
I see more value on the business side than the tech side. Ask the AI to transcribe images, write an email, parse some excel data, create a prototype, etc. Some of which you might have hired a tech resource to write a script for.
On the tech side I see it saving some time with stuff like mock data creation, writing boiler plate, etc. You still have to review it like it's a junior. You still have to think about the requirements and design to provide a detailed understanding to them (AI or junior).
I don't think either of these will provide 90% productivity gains. Maybe 25-50% depending on the job.
AStrangeMorrow 21 hours ago [-]
For me, the main thing is to never have it write anything based on the goal (what the end result should look like and how it should behave). And only on the implementation details (and coding practices that I like).
Sure it is not as fast to understand as code I wrote. But at least I mostly need to confirm it followed how it implemented what I asked. Not figuring out WHAT it even decided to implement in the first place.
And in my org, people move around projects quite a bit. Hasn’t been uncommon for me to jump in projects with 50k+ lines of code a few times a year to help implement a tricky feature, or help optimize things when it runs too slow. Lots of code to understand then. Depending on who wrote it, sometimes it is simple: one or two files to understand, clean code. Sometimes it is an interconnected mess and imho often way less organized that Ai generated code.
And same thing for the review process, lots of having to understand new code. At least with AI you are fed the changes a a slower pace.
alexjplant 21 hours ago [-]
> Why do people keep ralking about AI as it actually worked?
Because it does.
> I still don't see ANY proof that it doesn't generate a total unmaintainable unsecure mess, that since you didn't develop, you don't know how to fix.
I wouldn't know since it's been years since I've tried but I'd imagine that Claude Code would indeed generate a half-baked Next.js monstrosity if one-shot and left to its own devices. Being the learned software engineer I am, however, I provide it plenty of context about architecture and conventions in a bootstrapped codebase and it (mostly) obeys them. It still makes mistakes frequently but it's not an exaggeration to say that I can give it a list of fields with validation rules and query patterns and it'll build me CRUD pages in a fraction of the time it'd take me to do so.
I can also give it a list of sundry small improvements to make and it'll do the same, e.g. I can iterate on domain stuff while it fixes a bunch of tiny UX bugs. It's great.
thefounder 23 hours ago [-]
You can launch a new product in one month instead of 12 months. I think this works best for startups where the risk tolerance is high but works less than ideal for companies such Amazon where system failure has high costs
dominotw 23 hours ago [-]
so where are these one man products lauched in a month?
not talking about toys or vibecoded crap no one uses.
deaux 22 hours ago [-]
They're here, I made one. Not a toy or vibecoded crap, people got immediate value. Not planning to doxx myself by linking it. This was more than a year ago when models weren't even as good yet. A year later it has thousands of consistent monthly users, and it only keeps growing. It's nothing compared to VC startups but for a solo dev, made in a month? Again, it's not a toy, it offers new functionality that simply didn't exist yet and it improves people's lives. The reality is that there's no chance I would've done it without LLMs.
fragmede 23 hours ago [-]
Aside from the things people have launched that don't count because I'm right, where are the things that have been launched?
dominotw 22 hours ago [-]
apps with actual users is too high of a bar now?
Throaway1975123 22 hours ago [-]
probs look in the game space to see a bunch of muggles making money off of "vibe coded" software
UqWBcuFx6NV4r 21 hours ago [-]
This is delusional. Are you someone that tends to have your finger on the pulse of every piece of software ever released, as it’s being released, with knowledge about how it’s built?
You’re not.
Nobody is.
Perhaps nobody cares to “convince you” and “win you over”, because…why? Why do we all have to spoon feed this one to you while you kick and scream every step of the way?
If you don’t believe it, so be it.
20 hours ago [-]
dominotw 20 hours ago [-]
[flagged]
gamblor956 17 hours ago [-]
Real ones don't exist. Conveniently, nobody that has claimed to have created a functional AI product is willing to "doxx themselves" by linking to their app.
Weirdly, people who have actually created functional one-man products don't seem to have the same problem, as they welcome the business.
oro44 12 hours ago [-]
Exactly. This is f"cking hilarious.
"oooh Im afraid of doxxing myself", wtf? lmao!
UqWBcuFx6NV4r 21 hours ago [-]
Person that can’t use a hammer “hasn’t seen any proof” that hammers work.
MrScruff 17 hours ago [-]
I see a lot of people talk about 'insecure code' and while I don't doubt that's true, there's a lot of software development where security isn't actually a concern because there's no need for the software to be 'secure'. Maintainability is important I'll grant you.
lukev 23 hours ago [-]
I agree, but absence of evidence is not evidence of absence, and we currently have a lot of developers who feel very productive right now.
We are very much in need of an actual way to measure real economic impact of AI-assisted coding, over both shorter and longer time horizons.
There's been an absolute rash of vibecoded startups. Are we seeing better success rates or sales across the industry?
ThrowawayR2 22 hours ago [-]
> "absence of evidence is not evidence of absence"
It works. You’re just not doing it right if it doesn’t work for you. It’s hard to convince me otherwise at this point.
Kim_Bruning 23 hours ago [-]
Consider it this way as a reasoning step: We've invented a cross compiler that can handle the natural languages too. That's definitely useful; but it's still GIGO so you still need your brain.
K0balt 21 hours ago [-]
I’ve been using it to develop firmware in c++. Typically around 10-20 KLOC. Current projects use Sensors, wire protocols, RF systems , swarm networks, that kind of stuff integrated into the firmware.
If you use it correctly, you can get better quality, more maintainable code than 75% of devs will turn in on a PR. The “one weird trick” seems to be to specify, specify, specify. First you use the LLM to help you write a spec (document, if it’s pre existing). Make sure the spec is correct and matches the user story and edge cases. The LLM is good at helping here too. Then break down separations of concerns, APIs, and interfaces. Have it build a dependency graph. After each step, have it reevaluate the entire stack to make sure it is clear, clean, and self consistent.
Every step of this is basically the AI doing the whole thing, just with guidance and feedback.
Once you’ve got the documentation needed to build an actual plan for implementation, have it do that. Each step, you go back as far as relevant to reevaluate. Compare the spec to the implementation plan, close the circle. Then have it write the bones, all the files and interfaces, without actual implementations. Then have it reevaluate the dependency graph and the plan and the file structure together. Then start implementing the plan, building testing jigs along the way.
You just build software the way you used to, but you use the LLM to do most of the work along the way. Every so often, you’ll run into something that doesn’t pass the smell test and you’ll give it a nudge in the right direction.
Think of it as a junior dev that graduated top of every class ever, and types 1000wpm.
Even after all of that, I’m turning out better code, better documentation, and better products, and doing what used to take 2 devs a month, in 3 or 4 days on my own.
On the app development side of our business, the productivity gain also strong. I can’t really speak to code quality there, but I can say we get updates in hours instead of days, and there are less bugs in the implementations. They say the code is better documented and easier to follow , because they’re not under pressure to ship hacky prototype code as if it were production.
On the current project, our team size is 1/2 the size it would have been last year, and we are moving about 4x as fast. What doesn’t seem to scale for us is size. If we doubled our team size I think the gains would be very small compared to the costs. Velocity seems to be throttled more by external factors.
I really don’t understand where people are coming from saying it doesn’t work. I’m not sure if it’s because they haven’t tried a real workflow, or maybe tried it at all, or they are definitely “holding it wrong.” It works. But you still need seasoned engineers to manage it and catch the occasional bad judgment or deviation from the intention.
If you just let it, it will definitely go off the rails and you’ll end up with a twisted mess that no one can debug. But use a system of writing the code incrementally through a specification - evaluation loop as you descend the abstraction from idea to implementation you’ll end up winning.
As a side note, and this is a little strange and I might be wrong because it’s hard to quantify and all vibes, but:
I have the AI keep a journal about its observations and general impressions, sort of the “meta” without the technical details. I frame this to it as a continuation of “awareness “ for new sessions.
I have a short set of “onboarding“ documents that describe the vision, ethos, and goals of the project. I have it read the journal and the onboarding docs at the beginning of each session.
I frame my work with the AI as working with it as a “collaborator” rather than a tool. At the end of the day, I remind it to update its journal of reflections about the days work. It’s total anthropomorphism, obviously, but it seems to inspire “trust” in the relationship, and it really seems to up-level the effort that the AI puts in. It kinda makes sense, LLMs being modelled on human activity.
FWIW, I’m not asserting anything here about the nature of machine intelligence, I’m targeting what seems to create the best result. Eventually we will have to grapple with this I imagine, but that’s not today.
When I have forgotten to warm-start the session, I find that I am rejecting much more of the work. I think this would be worth someone doing an actual study to see if it is real or some kind of irresistible cognitive bias.
I find that the work produced is much less prone to going off the rails or taking shortcuts when I have this in the context, and by reading the journal I get ideas on where and how to do a better job of steering and nudging to get better results. It’s like a review system for my prompting. The onboarding docs seem to help keep the model working towards the big picture? Idk.
This “system” with the journal and onboarding only seems to work with some models. GPT5 for example doesn’t seem to benefit from the journal and sometimes gets into a very creepy vibe. I think it might be optimized for creating some kind of “relationship” with the user.
systemsweird 18 hours ago [-]
This is one of the best descriptions of using AI effectively I’ve read. It becomes clear that using AI effectively is about planning, architecture, and directing another intelligent agent. It’s essential to get things right at each high level step before drilling in deeper as you clearly outlined.
I suspect you either already were or would’ve been great at leading real human developers not just AI agents. Directing an AI towards good results is shockingly similar to directing people. I think that’s a big thing separating those getting great results with AI from those claiming it simply does not work. Not everyone is good at doing high level panning, architecture, and directing others. But those that already had those skills basically just hit the ground running with AI.
There are many people working as software engineers who are just really great at writing code, but may be lacking in the other skills needed to effectively use AI. They’re the angry ones lamenting the loss of craft, and rightfully so, but their experience with AI doesn’t change the shift that’s happening.
the_real_cher 24 hours ago [-]
Yeah its great but as the OP said you have to watch every P.R. like a hawk
FromTheFirstIn 23 hours ago [-]
OP said you stop doing that at some point
gedy 22 hours ago [-]
Productivity really means nothing across companies and the software industry. So we code faster, but honestly problem I see is companies ask us to code stupid or worthless things in many cases.
CTO is rewriting company platform (by himself with AI) and is convinced it's 100x productivity. But when you step back and look at the broader picture, he's rewriting what something like Rails, .NET, or Spring gave us 15-20 years ago? It's just in languages and code styles he is (only) familiar with. That's not 100x for the business, sorry...
iExploder 1 days ago [-]
yes you fire if you are a burned out company that only needs to maintain its product and slowly die...
you hire more if you are growth and have new ideas just never had the chance to implement them as they were not practical of feasible at that level of tech (non-assisted humans clicking code and taking sick leaves)
dismalaf 13 hours ago [-]
In my experience the speed gains are real (kind of) but not in a "the AI writes code" kind of way, more of a "the AI is Google on steroids that can reference the docs very quickly". I've found it saves me a ton of time searching through documentation, googling and dealing with bad search results or ads, etc... But it doesn't write good code. So a productivity boost, not 10x, but a bit. And no, I wouldn't fire anyone. It's not that good.
KellyCriterion 19 hours ago [-]
Third option:
Not hiring someone?
varispeed 18 hours ago [-]
AI just removes the boring stuff. It's like having developers spending their valuable time on typing and then comes a product that removes typing cost and dumb managers are like ha! we now need fewer developers! And the smart ones go, great we can shorten the feedback loop and use existing resources to make more improvements!
j45 22 hours ago [-]
You have your devs be engineering managers over the tools.
I keep being told this and the tools keep falling at the first hurdle. This morning I asked Claude to use a library to load a toml file in .net and print a value. It immediately explained how it was an easy file format to parse and didn’t need a library. I undid, went back to plan mode and it picked a library, added it and claimed it was done. Except the code didn’t compile.
Three iterations later of trying to get Claude to make it compile (it changed random lines around the clear problematic line) I fixed it by following the example in the readme, and told Claude.
I then asked Claude to parse the rest of the toml file, whereby it blew away the compile fix I had made..
This isn’t an isolated experience - I hit these fundamental blocking issues with pretty much every attempt to use these tools that isn’t “implement a web page”, and even when it does that it’s not long before it gets tangled up in something or other…
The real magic of LLMs comes when they iterate until completion until the code compiles and the test passes, and you don't even bother looking at it until then.
Each step is pretty stupid, but the ability to very quickly doggedly keep at it until success quite often produces great work.
If you don't have linters that are checking for valid syntax and approved coding style, if you don't have tests to ensure the LLM doesn't screw up the code, you don't have good CI, you're going to have a bad time.
LLMs are just like extremely bright but sloppy junior devs - if you think about putting the same guardrails in place for your project you would for that case, things tend to work very well - you're giving the LLM a chance to check its work and self correct.
It's the agentic loop that makes it work, not the single-shot output of an LLM.
If you read my post, you’d see that Claude code didn’t do that, I had to intervene in the agent loop and when I did it undid my fixes.
In fact, I find that with a strict feedback loop set up (i.e. a lot of lint rules, a strict type checker and fast unit tests), it will almost always generate what I want.
As someone else said, each step might be pretty stupid, but if you have a fast iteration loop, it can run until everything passes cleanly. My recommendation is to specify what really counts as "done" in your AGENTS.md/CLAUDE.md.
There are techniques that can help deal with this but none of them work perfectly, and most of the time some direct oversight from me is required. And this really clips the potential productivity gains, because in order to effectively provide oversight you need to page in all the context of what's going on and how it ought to work, which is most of what the LLMs are in-theory helping you with.
LLMs are still very useful for certain tasks (bootstrapping in new unfamiliar domains, tedious plumbing or test fixture code), but the massive productivity gains people are claiming or alluding to still feel out of reach.
For instance, if you are working on a compiler and have a huge test database of code to compile that all has tests itself, "all sample code must compile and pass tests, ensuring your new optimizer code gets adequate branch coverage in the process" - the underlying task can be very difficult, but you have large amounts of test coverage that have a very good chance at catching errors there.
At the very least "LLM code compiles, and is formatted and documented according to lint rules" is pretty basic. If people are saying LLM code doesn't compile, then yes, you are using it very incorrectly, as you're not even beginning to engage the agentic loop at all, as compiling is the simplest step.
Sure, a lot of more complex cases require oversight or don't work.
But "the code didn't compile" is definitely in "you're holding it wrong" territority, and it's not even subtle.
But honestly I think sane code organization is the bigger hurdle, which is a lot harder to get right without manual oversight. Which of course leads to the temptation to give up on reviewing the code and just trusting whatever the LLM outputs. But I'm skeptical this is a viable approach. LLMs, like human devs, seem to need reasonably well-organized code to be able to work in a codebase, but I think the code they output often falls short of this standard.
(But yes agree that getting the LLM to iterate until CI passes is table-stakes.)
I think getting good code organization out of an LLM is one of the subtler things - I've learned quite a bit about what sort of things need to be specified, realizing that the LLM isn't actively learning my preferences particularly well, so there are some things about code organization I just have to be explicit about.
Which is more work, but less work than just writing the code myself to begin with.
I didn't give my Agent any tools, it hallucinated code.
...well, yes. It's like evaluating a programmer's skill by having one-shot a program on paper with zero syntax errors.
I don't know anything about DotNet, but I just fired up Claude Code in an empty directory and asked it to create an example dotnet program using the Tomlyn library, it chugged away and ~5 minutes later I did a "grep Deserialize *" in the project and it came up with exactly the line you wanted (in your comment here) it to produce: var model = TomlSerializer.Deserialize<TomlTable>(tomlContent)!;
The full results of what it produced are at https://github.com/linsomniac/tomlynexample
That includes the prompt I used, which is:
Please create a dotnet sample program that uses the library at https://github.com/xoofx/Tomlyn to parse the TOML file given on the command line. Please only use the Tomlyn library for parsing the TOML file. I don't have any dotnet tooling installed on my system, please let me know what is needed to compile this example when we get there. Please use an agent team consisting of a dotnet expert, a qa expert, a TOML expert a devils advocate and a dotnet on Linux expert.
I can't really comment on the code it produced (as I said, I don't use dotnet, I had to install dotnet on my system to try this), so I can't comment on the approach. 346 lines in Program.cs seems like a lot for an example TOML program, but I know Claude Code tends to do full error checking, etc, and it seems to have a lot of "pretty printing" code.
When I prompted, it went spelunking in the parent folder for a bunch of stuff, which I interrupted and told it to focus on its current folder. It came up with a sensible plan, which I let it execute. I then switched to a new tab and ran `dotnet run publishing/publish.cs` - like claude says. And the script fails at the first hurdle. The agentic loop came up with a plan, claimed it was done, but _didn't execute the verifiaction it told me it was going to execute_.
[0] https://gist.github.com/donalmacc/e91955be9792a983c375a098ba...
Legit this morning Claude was essentially unusable for me
I could explicitly state things it should adjust and it wouldn't do it.
Not even after specifying again, reverting everything eventually and reprompt from the beginning etc. Even super trivial frontend things like "extract [code] into a separate component"
After 30 minutes of that I relented and went on to read a book. After lunch I tried again and it's intelligence was back to normal
It's so uncanny to experience how much ita performance changes - I strongly suspect anthropic is doing something whenever it's intelligence drops so much, esp. Because it's always temporary - but repeatable across sessions if occuring... Until it's normal again
But ultimately just speculation, I'm just a user after all
And then the next step is to dynamically vary resources based on prediction of user stickiness. User is frustrated and thinking of trying competitor -> allocate full resources. User is profiled as prone to gambling and will tolerate intermittent rewards -> can safely forward requests to gimped models. User is an resolute AI skeptic and unlikely to ever preach the gospels of vibecoding -> no need to waste resources on him.
Its quite likely this is already happening buddy...
The 'random' degradation across all LLM-based services is obvious at this point.
Honestly, this is my experience. Every now and again it just completely self implodes and gives up, and I’m left to pick up the pieces. Look at the other replies who are making sure I’m using the agrntic loop/correct model/specific enough prompt - I don’t know what they’re doing but I would love to try the tools they’re using.
Maybe Anthropic is trying to cut costs a little and we are all just gaslighting ourselves into thinking its our problem.
Like when I was trying to find a physical store again with ChatGPT Pro 5.4 and asked it to prepare a list of candidates, but the shop just wasn't in the list, despite GPT claiming it to be exhaustive. When I then found it manually and asked GPT for advice on how I could improve my prompting in the future, it went full "aggressively agreeable" on me with "Excellent question! Now I can see exactly why my searches missed XY - this is a perfect learning opportunity. Here's what went wrong and what was missing: ..." and then 4 sections with 4 subsections each.
It's great to see the AI reflect on how it failed. But it's also kind of painful if you know that it'll forget all of this the moment the text is sent to me and that it will never ever learn from this mistake and do better in the future.
If 80% of the time they 10x my output, and the other 20% I can say "well they failed, I guess this one I have to do manually" - that's still an absolutely massive productivity boost.
Keep in mind that the AI is not reflecting, and it has no idea it made a mistake. It's just generating statistically-likely text for "apology" * "ai did not find results".
I wonder if it was getting blocked on searches or something, and just didn't tell you.
I try to give the model as little freedom as possible. That usually means it's not being used for novel work.
But that's the hard part! You can only eke out moderate productivity gains by automating the tedium of actually writing out the code, because it's a small fraction of software engineering.
Then it crawls around for awhile, does some web searches, fetches docs from here and there, whatever. Sometimes it'll ask me some questions. And then it'll finally spit out a plan. I'll read through it and just give it a massive dump of issues, big and small, more questions I have, whatever. (I'll also often be spinning off new planning sessions for pre-work or ancillary tasks that I thought of while reviewing that plan). No structure or anything, just brain dump. Maybe two rounds of that, but usually just one. And then I'll either have it start building, or I'll have it stash in the linear agent so I can kick it off later.
The code it needed to write was:
Which is in the readme of the repo. It could also have generated a class and deserialised into that. Instead it did something else (afraid I don’t have it handy sorry)I remember having to write code on paper for my CS exams, and they expected it to compile! It was hard but I mostly got there. definitely made a few small mistakes though
The only reason I wasted so much time was that given all of my other positive experiences, clearly I must have been prompting it incorrectly, and I should learn how to fix it.
After blowing four hours I just changed the colors by hand. I took away that it's almost completely unpredictable what will be easy and what will be hard for an LLM.
Friday afternoon I made a new directory and told Claude Code I wanted to make a Go proxy so I could have a request/callback HTTP API for a 3rd party service whose official API is only persistent websocket connections. I had it read the service’s API docs, engage in some back and forth to establish the architecture and library choices, and save out a phased implementation plan in plan mode. It implemented it in four phases with passing tests for each, then did live tests against the service in which it debugged its protocol mistakes using curl. Finally I had it do two rounds of code review with fresh context, and it fixed a race condition and made a few things cleaner. Total time, two hours.
I have noticed some people I work with have more trouble, and my vague intuition is it happens when they give Claude too much autonomy. It works better when you tell it what to do, rather than letting it decide. That can be at a pretty high level, though. Basically reduce the problem to a set of well-established subproblems that it’s familiar with. Same as you’d do with a junior developer, really.
Equating "junior developers" and "coding LLMs" is pretty lame. You handhold a junior developers so, eventually, you don't have to handhold anymore. The junior developer is expected to learn enough, and be trusted enough, to operate more autonomously. "Junior developers" don't exist solely to do your bidding. It may be valuable to recognize similarities between a first junior developer interaction and a first LLM interaction, but when every LLM interaction requires it to be handheld, the value of the iterative nature of having a junior developer work along side you is not at all equivalent.
I simply said the description of the problem should be broken down similar to the way you’d do it for a junior developer. As opposed to the way you’d express the problem to a more senior developer who can be trusted to figure out the right way to do it at a higher level.
What’s giving too much autonomy about
“Please load settings.toml using a library and print out the name key from the application table”? Even if it’s under specified, surely it should at least leave it _compiling_?
I’ve been posting comments like this monthly here, my experience has been consistently this with Claude, opencode, antigravity, cursor, and using gpt/opus/sonnet/gemini models (latest at time of testing). This morning was opus 4.6
Are you using Claude Code? Do yo have it configured so that you are not allowing it to run the build? Because I've observed that Claude Code is extremely good at making sure the code compiles, because it'll run a compile and address any compile errors as part of the work.
I just asked it to build a TOML example program in DotNet using Tomlyn, and when it was done I was able to run "./bin/Debug/net8.0/dotnettoml example.toml", it had already built it for me (I watched it run the build step as part of its work, as I mentioned it would do above).
> I’ve observed Claude code is extremely good at making sure the code compiles
My observation is that it’s fine until it’s absolutely not, and the agentic loop fails.
I don't know that it's useful to assign blame here.
It probably is to your benefit, if you are a coding professional, to understand why your results are so drastically different from what others are seeing. You started this thread saying "I keep getting told I'll be amazed at what it can do, but the tools keep failing at the first hurdle."
I'm telling you that something is wrong, that is why you are getting poor results. I don't know what is wrong, but I've given you an example prompt and an example output showing that Claude Code is able to produce the exact output you were looking for. This is why a lot of people are saying "you'll be amazed at what it can do", and it points to you having some issue.
I don't know if you are running an ancient version of Claude Code, if you are not using Opus 4.6, you are not using "high" effort (those are what I'm using to get the results I posted elsewhere in reply to your comment), but something is definitely wrong. Some of what may be wrong is that you don't have enough experience with the tooling, which I'd understand if you are getting poor results; you have little (immediate) incentive to get more proficient.
As I said, I was able to tell Claude Code to do something like the example you gave, and it did it and it built, without me asking, and produced a working program on the first try.
Oh - I’m blaming Claude not anyone else. I’ve tried again this evening and the same prompt (in the same directory on the same project) worked.
> i don’t know if you’re using an ancient version of Claude code,
I’m on a version from some time last week, and using opus 4.6
> This is why a lot of people are saying "you'll be amazed at what it can do", and it points to you having some issue.
If you look at my comments in these threads, I’ve had these issues and been posting about this for months. I’m still being told “ you’re using the wrong model or the wrong tool or you’re holding it wrong” but yet, here I am.
I’m using plan mode, clearly breaking down tasks and this happens to me basically every time I use the damn tool. Speaking to my team at work and friends in other workplaces, I hear the same thing. But yet we’re just using it wrong or doing something wrong,
Honestly, I genuinely think the people who are not having these experiences just… don’t notice that they are.
For some reason I'm getting downvoted for trying to help, but regardless, if you can (and want to) post a transcript somewhere of some of these sessions that aren't working out, maybe some of us who are having more success can take a look and see what's up.
Fresh prompt in a codebase with a claude.md I asked it to do a simple task. I've shared the prompt, plan, output in the linked gist.
We’ve gone from “I’m baffled at your experience” to well yeah it often fails” in two sentences here…
I also clearly said I didn’t allow it one output, I gave it the compile error message, it changed a different line, I told it it was at the affected line and to check the docs. Claude code then tried to query the DLL for the function, abandoned that and then did something else incorrect.
I’m literally asking it to install a package and copy the example from the readme
At my company, we are maintaining our hiring plan (I'm the decision maker). We have never been more excited at our permission to win against the incumbents in our market. At the same time, I've never been more concerned about other startups giving us a real run. I think we will see a bit of an arms race for the best talent as a result.
Productivity without clear vision, strategy and user feedback loops is meaningless. But those startups that are able to harness the productivity gains to deliver more complete and polished solutions that solve real problems for their users will be unstoppable.
We've always seen big gains by taking a team of say 8 and splitting it into 2 teams of 4. I think the major difference is that now we will probably split teams of 4 into 2 teams of 2 with clearer remits. I don't want them to necessarily delivery more features. But I do want them to deliver features with far fewer caveats at a higher quality and then iterate more on those.
Humans that consume the software will become the bottlenecks of change!
They'd be unstoppable irrespective of LLMs. Why do you think Zuckerberg acquired Instagram? He literally tried copying it and failed. Instagram at the time was absolutely tiny in terms of pure labour, relative to Facebook.
Most people on hacker news are missing the point. Productivity gains for the sake of perceived productivity gains is not what creates economic value. Its not the equivalent of a factory all of a sudden becoming more productive in producing more of the same stuff. Not comparable at all.
Ruby on Rails and its imitators blew away tons of boilerplate. Despite some hype at the time about a productivity revolution, it didn’t _really_ change that much.
> , libraries, build-tools,
Ensure what you mean by this; what bearing do our friends the magic robots have on these?
> and refactoring
Again, IntelliJ did not really cause a productivity revolution by making refactoring trivial about 20 years ago. Also, refactoring is kind of a solved problem, due to IntelliJ et al; what’s an LLM getting you there that decent deterministic tooling doesn’t?
the ones i've used come with defaults that you can then customize. here are some of the better ones:
- https://guides.rubyonrails.org/command_line.html#generating-...
- https://hexdocs.pm/phoenix/Mix.Tasks.Phx.Gen.html
- https://laravel.com/docs/13.x/artisan#stub-customization
- https://learn.microsoft.com/en-us/aspnet/core/fundamentals/t...
> my experience has been these get left behind as the service implementations change
yeah i've definitely seen this, ultimately it comes down to your culture / ensuring time is invested in devex. an approach that helps avoid drift is generating directly from an _actual_ project instead of using something like yeoman, but that's quite involved
> it comes down to ensuring time is invested in devex
That’s actually my point - the orgs haven’t invested in devex buyt that didn’t matter because copilot could figure out what to do!
And of course, we didn't see a massive layoff after the introduction of say, StackOverflow, or DreamWeaver, or jQuery vs raw JS, Twitter Bootstrap, etc.
If your dev group is spending 90% of their time on these... well, you'd probably be right to fire someone. Not most of the developers but whoever put in place a system where so much time is spent on overhead/retrograde activities.
Something that's getting lost in the new, low cost of generating code is that code is a burden, not an asset. There's an ongoing maintenance and complexity cost. LLMs lower maintenance cost, but if you're generating 10x code you aren't getting ahead. Meanwhile, the cost of unmanaged complexity goes up exponentially. LLMs or no, you hit a wall if you don't manage it well.
My company has 20 years of accumulated tech debt, and the LLMs have been pretty amazing at helping us dig out from under a lot of that.
You make valid points, but I'm just going to chime in with adding code is not the only thing that these tools are good at.
Dude that's everybody in charge. You're young, you build a system, you mold it to shifting customer needs, you strike gold, you assume greater responsibility.
You hire lots of people. A few years go by. Why can't these people get shit done? When I was in their shoes, I was flying, I did what was needed.
Maybe we hired second rate talent. Maybe they're slacking. Maybe we should lay them off, or squeeze them harder, or pray AI will magically improve the outcome.
Come on, look in the mirror.
Even the most enthusiastic AI boosters I've read don't seem to agree with this. From what I can tell, LLMs are still mostly useful at building in greenfield projects, and they are weak at maintaining brownfield projects
And I have more examples, where I, personally, was on the both sides of the fence: defined by my level of the same problem understanding, not by the tooling.
If it is writing both the code and the tests then you're going to find that its tests are remarkable, they just work. At least until you deploy to a live state and start testing for yourself, then you'll notice that its mostly only testing the exact code that it wrote, its not confrontational or trying to find errors and it already assumes that its going to work. It won't ever come up with the majority of breaking cases that a developer will by itself, you will need to guide it. Also while fixing those the odds of introducing other breaking changes are decent, and after enough prompts you are going to lose coherency no matter what you do.
It definitely makes a lot of boilerplate code easier, but what you don't notice is that its just moving the difficult to find problems into hidden new areas. That fancy code that it wrote maybe doesn't take any building blocks, lower levels such as database optimization etc. into account. Even for a simple application a half-decent developer can create something that will run quite a bit faster. If you start bringing these problems to it then it might be able to optimize them, but the amount of time that's going to take is non-negligible.
It takes developers time to sit on code, learn it along with the problem space and how to tie them together effectively. If you take that away there is no learning, you're just the monkey copy-pasting the produced output from the black box and hoping that you get a result that works. Even worse is that every step you take doesn't bring you any closer to the solution, its pretty much random.
So what is it good for? It can both read, "understand", translate, write and explain things to a sufficient degree much faster than us humans. But if you are (at the moment) trusting it at anything past the method level for code then you're just shooting yourself in the foot, you're just not feeling the pain until later. In a day you can have it generate for example a whole website, backend, db etc. for your new business idea but that's not a "product", it might as well be a promotional video that you throw away once you've used it to impress the investors. For now that might still work, but people are already catching on and beginning to wise up.
Any sensible experienced programmer will take into account possible future features spoken about around the office, when deciding how implement a feature. To me that’s the key reason I can’t delegate AI complex features that will grow. The same reason I inspect and clean the toilet when I’m done - out of respect to my colleagues
I feel like most folks commenting uncritically here about second coming of Jesus must work in some code sweatshops, churning eshops or whatever is in vogue today quickly and moving on, never looking back, never maintaining and working on their codebases for decade+. Where I existed my whole career, speed of delivery was never the primary concern, quality of delivery (which everybody unanimously complaints one way or another with llms) was much more critical and thats where the money went. Maybe I just picked up right businesses, but then again I worked for energy company, insurance, telco, government, army, municipality, 2 banks and so on across 3 European states. Also, full working code delivery to production is a rather small part of any serious project, even 5x speed increase would move the needle just a tiny bit overall.
If I would be more junior, I would feel massive FOMO from reading all this (since I use it so far just as a quicker google/stackoverflow and some simpler refactoring, but oh boy is is sometimes hilariously and dangerously wrong). I am not, thus I couldn't care less. Node.js craze, 10x stronger, not seeing forest for the trees.
Just like financial markets, this phase is very volatile. But today’s price can look much different in 1, 2, 5, and 10 years.
Right now there’s a lot of opportunity to disrupt companies who are moving much slower since adoption can vary greatly.
But I think in a few years this meta will be overcrowded. The overall skill/productivity gap across companies will be reduced and the bar for productivity will be raised.
There’s room for taking profits now from the productivity gains but I don’t think it will last long.
If AI is really increasing productivity enough to reduce headcount right now, it won’t be in future. If all your competitors are using AI as effectively as you are, can you still do this?
The world’s demand for productivity is limitless. As of right now you still need someone to at least install Claude Code and run the binary.
The companies gutting engineering right now are going to realise too late that AI amplifies what you already have. If you fire the people with institutional knowledge and domain expertise, you're left with an AI that generates confident, well-formatted mediocrity. Agree or not?
It's not a direct multiplier, because even non-programmers can get stuff done with it just from an idea.
And it's not exactly an exponent either, even though experienced programmers can get absolutely massive gains when they learn to worth with it.
"Replace tools, not jobs" is the best AI slogan I've found yet. Use the VC funded practically free massive language models to quickly write every single small utility, tool, connector and dashboard you've ever wanted.
You know, the ones you've always wanted to do, but could never justify the time spent on them compared to the utility. Specifically in corporate environments. Stuff that used to require a project manager, programmer and a QA person dedicated to some tiny app for a week or two can now be done in two days by a single person with zero coding experience and an idea to a degree where you can be evaluate whether it's worth continuing or not.
Another way of increasing profit is to simply reduce your headcount by 90% while keeping the same profit.*
Hence, I think some companies will keep downsizing. Some companies will hire. It depends a lot.
*Assuming 90% productivity increase.
Is it the same with tech? Facebook has 3 billion monthly active users. No amount of tech will bring that up to 6 billion. If you were to double the amount of time someone spends on Facebook, or double the ads they see or double the click through rate, what does that really mean?
Taking the example of Facebook. They are in social media, messaging, AI, VR/AR hardware and software, a few other things, meta universe whatever that was, now left with the name. Facebook isn't delivering or successful on all its ventures, it knows that, it keeps investing in other segments.
More productivity would mean at least diversifying, they have some of the best engineers, it would make no sense to not simply attempt to hit the jackpot by playing more machines.
What fewer people talks about is that the entire tech industry is tertiary services. Ads, entertainment, communication, etc. If/when hard industries take a hit, tertiary takes a hit. If it isn't clear to you that the overall economy has already started to take some irreversible dents, and that those will accelerate, know that the capital is well aware.
Or we can continue wishful thinking and seek comfort that monetary tightening is just temporary, investments will flow more into tangent ventures and growth is around the corner, the U.S still is and will remain the world's strongest economy.
Even if it succeeded, having more buttons everywhere - marketplace, reels, stories, blogs, articles, groups, etc. A lot of this make the whole experience more crowded.
Productivity here has an odd effect, sort of like larger roads causing more traffic jams.
I think most companies are making the right call by downsizing instead of staying same size. Let people go to where there is more potential for growth.
Smart organizations will not just deliver better products but likely start products that they were hesitant to start before because the cost of starting is a lot closer to zero. Smart engineering leadership will encourage developers into delivering value and not self-serving, endless iterations of tooling enhancements, etc.
If I was a CTO and my competitor Y fired 90% of their devs, I'd try to secure funding to hire their top talent and retain them. The vitriol alone could fuel some interesting creations and when competitor Y realizes things later, their top talent will have moved on.
>> Smart organizations will not just deliver better products but likely start products [...]
This is not the 90s anymore when low hanging fruit was everywhere ready to be picked. We have everything under the sun now and more.
The problem with bullshit apps is not that it took you 5 months to build. What you build now in 5 minutes it's still bullshit. Most of the remaining work is bullshit jobs. Spinning useless "features" and frameworks that nobody needs and shove them down the throat of customers that never asked for them. Now it's possible to dig holes and fill them back (do pointless work) at much improved pace thanks to AI.
I have a vague memory of a hotel reservation "app" that was just putting lines in a Google Sheet where the founders grabbed the info, called the hotels themselves and reserved the room.
When they got too many requests to keep up, they slowly automated the bits they could one by one.
For greenfield development you don't need as many software engineers. Some developers (the top 10%) are still needed to guide AI and make architectural decisions, but the remaining 90% will work on the lifecycle management task mentioned above.
The productivity gains can be used to produce more software, and if you are able to sell the software you produce should result in a revenue boost. But if you produce more than you can sell then some people will be laid off.
Seriously, all managers I know (2, somehow) that have done this are now in sprint hell because they didn't realize an LLM cannot do the work their devs did. Bit of schadenfreude.
Not saying this can't be done, some projects are easy enough that you can throw an LLM at it
I was working with it just yesterday and found it mixed up class attributes and dict keys when looking if something existed. As a result, it always thought the key was missing. Glancing through the code, it would have been easy to overlook, as it was checking for something... just doing it the wrong way. The code also appeared to still work in the larger context, just not as well or as efficiently as it was supposed to work. It was only my obsessive testing and attention to detail that caught it.
I find it much faster and easier to write things myself, using AI to help me along the way, rather than trying to delegate everything to the AI and hoping for the best. Writing code is much more interesting than reviewing code, and it puts me close enough to it that those kind of issues don't happen in the first place.
I see it's necessary ALL the time. The AI generated code can be used as scaffolding, but it's newer get close to real production quality. The expierence from small startup with team of 5 developers. I do review and approve all PRs, and none ever able to pass AI code review from the first iteration.
After combine harvester, we produced the same food with less people.
At the moment, it seems like hardware is the constraint. Companies don't have access to enough machines or tokens to keep all their devs occupied, so they let some go. Maybe that changes, maybe we already have too much software?
Personally I think we already had too much software before LLMs and even without them many devs would have found themselves jobless as startups selling to startups selling to startups failed and we realized (again) that food, shelter, security, education etc are 'real' industries, but software isn't one if it's not actively helping one of those.
Unfortunately this kind of software needs specialised domain knowledge to produce that AI doesn't have yet, but when (if) it arrives I hope we see strides forwards in hardware engineering productivity.
Companies without the same constraints are well equipped to keep who they've got, pivot them into managing/overseeing agents to scale, and build better products from the outset.
So this'll be a good opportunity for smaller companies (or not-for-profits like co-ops and credit unions) to eat the lunches of bigger companies that'll be slow to adapt.
Remember sometimes the most productive thing to have is not money or people but time with your ideas.
The more grounded reality is that AI coding can be a productivity multiplier in the right hands, and a significant hindrance in the wrong hands.
Somewhere there exists a happy medium between vibe coding without ever looking at the code, and hand-writing every single line.
Seniors can adjust, but eg. junior frontend-only devs might be doomed in both situations, as they might not be able to contribute enough to business-critical features to justify their costs and most frontend-related tasks will be taken over by the "10x" seniors.
The main effect is that you build things a lot faster.
The secondary effect is that, since you can code a lot faster, you are no longer scared of large refactors, ports to other languages, and all that kind of tedious stuff.
That means that if you design something, ship it, and realize it's bad, you can completely change it.
The "oh we should have done it differently, but now it would take years to change, so we have to deal with legacy garbage" argument does really make sense anymore.
Like if HN had LLM agents, we wouldn't have had to wait for years for them to solve the "click here for more comments" problem. And iterating with users is a lot faster.
Situation a/ llm increase developer's productivity: you hire more developers as you cash profit. If you don't your competitor will.
b/ llm doesn't increase productivity, you keep cruising. You rejoice seeing some competitors lay off.
Reality shows dissonance with these only possible scenarios. Absurd decision making, a mistake? No mistake. Many tech companies are facing difficulties, they need to lose weight to remain profitable, and appease the shareholders demand for bigger margins.
How to do this without a backlash? Ai is replacing developers, Anthropic's CEO said engineers don't write code anymore, role obsolete in 6 months. It naturally makes sense we have to let some of them go. If the prophecy doesn't turn true, nobody ever get fired for buying IBM.
Amazon has demonstrated that it takes just as longer, or longer, to have senior devs review LLM output than it would to just have the senior devs do the programming in the first place. But now your senior devs are wasted on reviewing instead of developing or engineering. Amazon, Microsoft, Google, Salesforce, and Palantir have all suffered multiple losses in the tens of millions (or more) due to AI output issues. Now that Microsoft has finally realized how bad LLMs really are at generating useful output, they've begun removing AI functionality from Windows.
Product quality matters more than time to market. Especially in tech, the first-to-market is almost never the company that dominates, so it's truly bizarre that VCs are always so focused on their investments trying to be first to market instead of best to market.
If Competitor Y just fired 90% of their developers, I would have a toast with my entire human team. And a few months later, we'd own the market with our superior product.
If it is a big company the answer is and will always be: whatever makes the stock price rise the most.
This is a crazy take. Even if said people are matching or exceeding the outcome of those using the technology?
I’m not in this group. But the closest analog to what you are saying is firing people for not using a specific IDE.
I think that it's more along the lines of "do you fire people" instead of just "do you fire devs". Fewer devs means less of a need for PMs, so they can be let go as well, and maybe with the rise of AI assisted design tools, you don't need as many UX people, so you let some of them go as well.
As for building better products, I feel like that's a completely different topic than using AI for productivity gains, but only because at the end of the day you need buy in from upper management in order to build the features/redo existing features/both that will make the product better. I should also mention I'm viewing this from the position of someone who works at an established company and not a startup, so it may differ.
My own experience...
I've tried approaching vibe coding in at least 3 different ways. At first I wrote a system that had specs (markdown files) where there is a 1 to 1 mapping between each spec to a matching python module. I only ever edited the spec, treating the code itself as an opaque thing that I ignore (though defined the intrefaces for). It kind of worked, though I realized how distinct the difference between a spec that communicates intent and a spec that specifies detail really is.
From this, I felt that maybe I need to stay closer to the code, but just use the LLM as a bicycle of the mind. So I tried "write the code itself, and integrate an LLM into emacs so that you can have a discussion with the LLM about individual code, but you use it for criticism and guidance, not to actually generate code". It also worked (though I never wrote anything more then small snippets of Elisp with it). I learned more doing things this way, though I have the nagging suspicion that I was actually moving slower than I theoretically could have. I think this is another valid way.
I'm currently experimenting with a 100% vibe coded project (https://boltread.com). I mostly just drive it through interaction on the terminal, with "specs" that kind of just act as intent (not specifications). I find the temptation to get out of the outside critic mode and into just looking at the code is quite strong. I have resisted it to date (I want to experiment with what it feels like to be a vibe coder who cannot program), to judge if I realistically need to be concerned about it. Just like LLM generated things in general, the project seems to get closer and closer to what I want, but it is like shaping mud, you can put detail into something, but it won't stay that way over time; its sharp detail will be reduced to smooth curves as you then switch to putting detail elsewhere. I am not 100% sure on how to deal with that issue.
My current thoughts is that we have failed to actually find a good way of switching from the "macro" (vibbed) to the "micro" (hand coded) view of LLM development. It's almost like we need modules (blast chambers?) for different parts of any software project. Where we can switch to doing things by hand (or at least with more intent) when necessary, and doing things by vibe when not. Striking the balance between those things that nets the greater output is quite challenging, and it may not even be that there is an optimal intersection, but simply that you are exchanging immediate change for future flexibility to the software?
Terafab is suddenly making so much sense!
I'm not sure what your circumstances are but even if it's not true for you, it's true for many other people.
People online with identical views to them all assure me that theyre all highly skilled though.
Meanwhile I've been experimenting using AI for shopping and all of them so far are horrendous. Cant handle basic queries without tripping over themselves.
But you can understand why all the 1700 and below chess players say it is good and it is making them better using it for eval?
Don't worry, AI will replace you one day, you are just smarter than most of us so you don't see it yet.
This kind of thinking is actually a big reason why execs are being misinformed into overestimating LLM abilities.
LLM coding agents alone are not good enough to replace any single developer. They only make a developer x% faster. That dev who is now x% faster may then allow you to lay off another dev. That is a subtle yet critical difference.
To adress your point, let's try another one analogy. Imagine secreterial assistants, discussing their risk of been replaced by computers in the 80s. They would think: someone still need to type those letters, sit next to that phone and make those appointments, I am safe. Computers won't replace me.
It is not that AI will do all of your tasks and replace you. It is that your role as a specialist in software development won't be necessary most of the time (someone will do that, and that person won't call themselves a programmer).
I tend to find that the volume of automation predictions inversely correlates to how real they are.
When capitalists actually have the automation tech they dont shout about it they just do it quietly and collect the profits.
When, say, Bezos is worried about his unionizing workforce and wants to intimidate - that's when the hot takes and splashy media articles about billions invested in automation "coming for yer jerb" you read about get published.
For me the main difference is now some people can explain what their code does. While some other only what it wants to achieve
This is an interesting choice for a first experiment. I wouldn't personally base AI's utility for all other things on its utility for shopping.
Most people dont really understand coding but shopping is a far simpler task and so it's easier to see how and where it fails (i.e. with even mildly complex instructions).
Im very confident the experts in every field are not all that impressed by LLMs, relative of course, to those who were 'meh' in the first place. Experts meaning those who actually understand the content, not simply regurgitate or repeat it on demand.
I'd even go as far as to say there are many out there who have a feeling of disdain of the experts and want to see LLMs flourish because of this.
On the tech side I see it saving some time with stuff like mock data creation, writing boiler plate, etc. You still have to review it like it's a junior. You still have to think about the requirements and design to provide a detailed understanding to them (AI or junior).
I don't think either of these will provide 90% productivity gains. Maybe 25-50% depending on the job.
Sure it is not as fast to understand as code I wrote. But at least I mostly need to confirm it followed how it implemented what I asked. Not figuring out WHAT it even decided to implement in the first place.
And in my org, people move around projects quite a bit. Hasn’t been uncommon for me to jump in projects with 50k+ lines of code a few times a year to help implement a tricky feature, or help optimize things when it runs too slow. Lots of code to understand then. Depending on who wrote it, sometimes it is simple: one or two files to understand, clean code. Sometimes it is an interconnected mess and imho often way less organized that Ai generated code.
And same thing for the review process, lots of having to understand new code. At least with AI you are fed the changes a a slower pace.
Because it does.
> I still don't see ANY proof that it doesn't generate a total unmaintainable unsecure mess, that since you didn't develop, you don't know how to fix.
I wouldn't know since it's been years since I've tried but I'd imagine that Claude Code would indeed generate a half-baked Next.js monstrosity if one-shot and left to its own devices. Being the learned software engineer I am, however, I provide it plenty of context about architecture and conventions in a bootstrapped codebase and it (mostly) obeys them. It still makes mistakes frequently but it's not an exaggeration to say that I can give it a list of fields with validation rules and query patterns and it'll build me CRUD pages in a fraction of the time it'd take me to do so.
I can also give it a list of sundry small improvements to make and it'll do the same, e.g. I can iterate on domain stuff while it fixes a bunch of tiny UX bugs. It's great.
not talking about toys or vibecoded crap no one uses.
Nobody is.
Perhaps nobody cares to “convince you” and “win you over”, because…why? Why do we all have to spoon feed this one to you while you kick and scream every step of the way?
If you don’t believe it, so be it.
Weirdly, people who have actually created functional one-man products don't seem to have the same problem, as they welcome the business.
"oooh Im afraid of doxxing myself", wtf? lmao!
We are very much in need of an actual way to measure real economic impact of AI-assisted coding, over both shorter and longer time horizons.
There's been an absolute rash of vibecoded startups. Are we seeing better success rates or sales across the industry?
That's the same false argument that the religious have offered for their beliefs and was debunked by Bertrand Russell's teapot argument: https://en.wikipedia.org/wiki/Russell%27s_teapot
If you use it correctly, you can get better quality, more maintainable code than 75% of devs will turn in on a PR. The “one weird trick” seems to be to specify, specify, specify. First you use the LLM to help you write a spec (document, if it’s pre existing). Make sure the spec is correct and matches the user story and edge cases. The LLM is good at helping here too. Then break down separations of concerns, APIs, and interfaces. Have it build a dependency graph. After each step, have it reevaluate the entire stack to make sure it is clear, clean, and self consistent.
Every step of this is basically the AI doing the whole thing, just with guidance and feedback.
Once you’ve got the documentation needed to build an actual plan for implementation, have it do that. Each step, you go back as far as relevant to reevaluate. Compare the spec to the implementation plan, close the circle. Then have it write the bones, all the files and interfaces, without actual implementations. Then have it reevaluate the dependency graph and the plan and the file structure together. Then start implementing the plan, building testing jigs along the way.
You just build software the way you used to, but you use the LLM to do most of the work along the way. Every so often, you’ll run into something that doesn’t pass the smell test and you’ll give it a nudge in the right direction.
Think of it as a junior dev that graduated top of every class ever, and types 1000wpm.
Even after all of that, I’m turning out better code, better documentation, and better products, and doing what used to take 2 devs a month, in 3 or 4 days on my own.
On the app development side of our business, the productivity gain also strong. I can’t really speak to code quality there, but I can say we get updates in hours instead of days, and there are less bugs in the implementations. They say the code is better documented and easier to follow , because they’re not under pressure to ship hacky prototype code as if it were production.
On the current project, our team size is 1/2 the size it would have been last year, and we are moving about 4x as fast. What doesn’t seem to scale for us is size. If we doubled our team size I think the gains would be very small compared to the costs. Velocity seems to be throttled more by external factors.
I really don’t understand where people are coming from saying it doesn’t work. I’m not sure if it’s because they haven’t tried a real workflow, or maybe tried it at all, or they are definitely “holding it wrong.” It works. But you still need seasoned engineers to manage it and catch the occasional bad judgment or deviation from the intention.
If you just let it, it will definitely go off the rails and you’ll end up with a twisted mess that no one can debug. But use a system of writing the code incrementally through a specification - evaluation loop as you descend the abstraction from idea to implementation you’ll end up winning.
As a side note, and this is a little strange and I might be wrong because it’s hard to quantify and all vibes, but:
I have the AI keep a journal about its observations and general impressions, sort of the “meta” without the technical details. I frame this to it as a continuation of “awareness “ for new sessions.
I have a short set of “onboarding“ documents that describe the vision, ethos, and goals of the project. I have it read the journal and the onboarding docs at the beginning of each session.
I frame my work with the AI as working with it as a “collaborator” rather than a tool. At the end of the day, I remind it to update its journal of reflections about the days work. It’s total anthropomorphism, obviously, but it seems to inspire “trust” in the relationship, and it really seems to up-level the effort that the AI puts in. It kinda makes sense, LLMs being modelled on human activity.
FWIW, I’m not asserting anything here about the nature of machine intelligence, I’m targeting what seems to create the best result. Eventually we will have to grapple with this I imagine, but that’s not today.
When I have forgotten to warm-start the session, I find that I am rejecting much more of the work. I think this would be worth someone doing an actual study to see if it is real or some kind of irresistible cognitive bias.
I find that the work produced is much less prone to going off the rails or taking shortcuts when I have this in the context, and by reading the journal I get ideas on where and how to do a better job of steering and nudging to get better results. It’s like a review system for my prompting. The onboarding docs seem to help keep the model working towards the big picture? Idk.
This “system” with the journal and onboarding only seems to work with some models. GPT5 for example doesn’t seem to benefit from the journal and sometimes gets into a very creepy vibe. I think it might be optimized for creating some kind of “relationship” with the user.
I suspect you either already were or would’ve been great at leading real human developers not just AI agents. Directing an AI towards good results is shockingly similar to directing people. I think that’s a big thing separating those getting great results with AI from those claiming it simply does not work. Not everyone is good at doing high level panning, architecture, and directing others. But those that already had those skills basically just hit the ground running with AI.
There are many people working as software engineers who are just really great at writing code, but may be lacking in the other skills needed to effectively use AI. They’re the angry ones lamenting the loss of craft, and rightfully so, but their experience with AI doesn’t change the shift that’s happening.
CTO is rewriting company platform (by himself with AI) and is convinced it's 100x productivity. But when you step back and look at the broader picture, he's rewriting what something like Rails, .NET, or Spring gave us 15-20 years ago? It's just in languages and code styles he is (only) familiar with. That's not 100x for the business, sorry...
you hire more if you are growth and have new ideas just never had the chance to implement them as they were not practical of feasible at that level of tech (non-assisted humans clicking code and taking sick leaves)
Not hiring someone?