I believe it was February of 2025 when I first started using Cursor on personal projects. I had an idea to use it on an idea I wanted to build for some time now, Porcupine. Until AI tools were good enough, I did not want to spend my outside of work time coding, it would be too much coding in my life.
I used VSCode's Copilot when it was in beta and the tab complete was great. I would add comments in the code to essentially proto-prompt the AI as well. But this was in an existing, large, legacy codebase, not a greenfield project.
Cursor when it came out was a game changer. It could generate large amounts of code quickly, in your editor, and was fairly good at it.
I started using Cursor to build Porcupine. At first, I tried doing it in Rust, and then Go, figuring, if there's an AI writing it, why not use new languages with strong typing and compile time checks? Turns out, it was not good enough at the time for that.
I ended up pivoting to SolidJS, typescript and all, but since it was a web app only I thought, let's just keep it in a javascript monolith. This way, we can reference files in the same language and leverage the same framework to make the AI tools work better.
I also started using Claude Code. Originally, you had to pay for credits I believe. It was actually scary how expensive that could get with vibe coding, I didn't think it would take off because you could easily burn through 20 bucks in one sitting. And at this point, I was still a little new to vibe coding and also the models were not as good as they are right now.
I had what I would call two "false starts" with vibe coding. My routine was essentially to try and break up work into iterative chunks, as if you're feeding work to an intern. Then Claude would go off and write code for those chunks. I would "review" the code by QAing it in the actual webapp. I also figured it would be better to start with prototyping, by not having a "real backend" or database, and just mocking out those calls. I told it to use mock data in json files or in-memory structures.
This would work for some time to prototype, but it was actually incredibly hard to get it undo the prototyped code, to refactor. So I just instead deleted the entire project and started over.
The second time, I had a similar strategy but just had it use a real database and real backend calls. This was better, but as I started reaching the 5 digits in lines of code, the project was essentially broken because the code quality was so poor and inconsistent. To note, this was not how anyone should code for production database. I was truly vibe coding and not reading any lines of code. I might take cursory glances at the Claude Code output, but I was essentially prompting and forgetting, watching tv or movies and checking on Claude here and there.
After the second time, I decided to take checkpoints and prune the code for some of the really bad patterns. By pruning, I would actually look at the code and then tell Claude to fix certain things. I still did not write any of the code myself. The pattern I found that was most inconsistent early on was it would use in-memory data when it wanted instead of making real calls to a real database.
Surprisingly, I got very close to what I would call an MVP set of features for Porcupine. I would vibe code, prune, vibe code, etc. Once the MVP was working, I then further asked Claude to analyze our codebase for inconsistent patterns, bad architecture, etc. These were also truly lazy prompts, nothing like some of those essays of "system prompts" that people update on github repos every week.
Then the largest part of the project was spent fixing up the codebase, or what I was calling "AI Optimizing" it. First, I had to get rid of the redefined types all over the backend and the frontend. There was one user type defined in my Prisma schema, but somehow, there were like 30 different user structs being recreated for each database type throughout. Once we were using autogenerated types based off the DB throughout the codebase, the LLM got increasingly better at writing code. The next leap in optimization was linting, as strict and custom of linting as possible to prevent the LLM from writing slop, like 6 conditions of logical chaining or endless try catch blocks. Lastly, I spent a lot of time writing tests using a test pyramid structure. Writing end to end playwright tests were the most time consuming and manual.
I also started to play around with specialized Claude agents and skills. I found the agents, and the fact that you can run many of them at once, to be the most productive thing Claude built so far. You could do a bunch of work in parallel using the same or different agents, or you could build an agent pipeline that feeds one parent prompt through a series of agents. For example, you could implement a new feature by first using plan, then a testing agent to do TDD style programming, then the front end agent implements the front end, the backend the backend, and then have Claude synthesize the results and run final testing, linting, a human QA at the end, and then merge it. Total time to writing a feature is most definitely under an hour depending on the size here, and would say could easily be under 20 minutes to do what might have taken a week, if you think of start to finish planning, implementing, QAing, testing, reviewing, and merging.
Overall, vibe coding is incredibly powerful for a lot of stuff. AI tools are definitely "here" and definitely are increasing producivity. It definitely has limits, largely based on the type of work you are doing and your skill level. Even if LLMs hit some sort of cap where Claude Opus 4.5 is only marginally worse than Claude Opus 6 or something, Anthropic and other companies have a lot of classic software engineering work they can do to improve things like writing code, memory management, skills, agents, etc.
In 2026, I am going to look more into ways to safely and productively use AI at work. My team primarily works in a large legacy codebase that is Ruby on Rails, but migrating our backend code to Go services gives us a lot of opportunity for velocity improvement, not only from a new codebase, typed language, and not having to worry about frontend anymore, but also by AI optimizing our Go services to increase code throughput substantially.
At Justworks, we are domain heavy, so I want to play around a bit more with skills and potentially keeping skills in sync with our internal knowledge base, such that the Claude skills can function as relevant business experts, combined with Claude agents that are experts in Go, database design, API design, etc., my engineers could focus on the harder problems of scaling, good architecture design, implementation trade offs, code quality, etc.