We expect so much from our developers. The unbearable thought that one developer could be idle or have nothing to do is one of the worst nightmares a manager could have.

Now, with the AI tools, this has intensified: we are not only expecting our developers to do system design, coding, testing, shipping, and maintaining, but we expect them to do it 10x faster.

The One Key Press Developer

Early in my career, I worked for Neoris (I’m still recovering from PTSD from that experience.) One of their big clients was Cemex.

My PM used to laugh at the fact that they hired one engineer whose job was to press the Y key in the middle of an ETL process.

I’m guessing it was such a critical decision that required human supervision.

Truth be told, they were pretty bad at hiring good engineers. That process could have been easily automated.

Are you Automating your Decisions?

Automation has gone wild in today’s world. If you are hacky enough, you could automate your entire development team.

But here is the risk: due to the indeterministic nature of the LLM’s output (one prompt won’t yield the same output two times), you need human supervision.

I’m not talking about Prompt Engineering; I’m talking about Engineers whose job is to review and approve outputs.

The One-Time Shot Trap

For almost two years, I’ve been building my own LLM Prompt-based tool to write articles.

I’ve used all my battle-tested prompts to produce articles that get me there 80-90% of the time (this was before the Search and Deep Research features in ChatGPT.)

My main takeaway? Avoid the one-time shot trap.

Because of the AI hype, we tend to believe that if we ask a question or request, we will get the perfect answer at the first time. That’s barely true in most cases.

The reality is that once you have fine-tuned your prompt, you need to run it multiple times and pick up the best output.

Natural Selection Prompting

It is like a natural selection process: You need to compare two models and test different prompts to validate the quality of the outputs. Then, you need to run the prompts multiple times to get different outputs and pick the winners.

Here is the quick process – let’s say you are prompting for good article headlines:

  1. Pick two models (for example, GPT-4.5 and o3)
  2. Prepare your prompt: use your own recipes here
  3. Run the prompt multiple times against each model
  4. Pick the best outputs/headlines
  5. Use them in real-world articles to validate results
  6. Gather the top performing and use them as examples in your prompts (step #2)
  7. Repeat

Does this sound like something you are doing? Does this sound like too much work for you?

I’ve got you covered!

Yep, that’s why I keep saying that we need new Engineering Roles for this new way of building software.

I don’t have a name for this role, but I do have engineers trained to do it.

I would love to hear where you stand with this. Drop me a human-generated line!

Leo Celis