Post

My AI Agent - Two Weeks Later — and a workflow worth sharing

Two weeks since my last touchpoint. Writing this for documentation, reality, and motivation to come back. There’s also a workflow I’ve been using at work that’s worth documenting.

What shipped

I added proactivity through Google Cloud Tasks. After some trouble with AWS EventBridge — specifically, it doesn’t support callbacks to arbitrary URLs directly, which is exactly what I needed — it was clear that Cloud Tasks was a better fit for my use case. Easy implementation, less friction. Good call.

The agent is still down for a few days now. I suspect I’m out of something’s free tier. I haven’t gone looking for why.

What didn’t go as expected

Despite the proactivity being wired up, the agent hasn’t gotten the intelligence I expected. On the first implementation it only scheduled follow-ups when told directly. I tried to improve it through prompt adjustments, but the results weren’t effective. The proactivity for tracking tasks also needs to be refined. This is a crucial thing to work on.

Life got in the way

These last two weeks I’ve been very busy for many reasons. I got approved for a master’s abroad — which is dope — and have been deep in documentation and tracking the visa application process. I’ve been at the consulate twice and will go a third time next week. Work has also been intense (shoutout to mavs.ai). And back pain came back.

All of that pushed me away from programming in my very limited free hours, which I’ve been spending mostly chilling. I know, despicable. But it’s the truth.

Interesting stuff from the last few weeks

I’ve been diving deeper into the structured AI development space and found some interesting frameworks:

  • GSD
  • AIOX — actually Brazilian
  • Ralph Loop — more lightweight
  • Han — very light, interested to check this one out

All of that left me itching to try a more structured framework. One thing I’ve internalized is that in this era, the cost of iteration is smaller. You don’t need to make old decisions work with new requirements when you can rebuild a small codebase with tests and human validation in a day or two. So I’m actually excited to potentially rebuild the repo using one of these frameworks — and why not try all of them? The idea is to stay out of the IDE entirely and just manage GitHub issues and pull requests, while autonomous agents pick them up and implement. Something like the Ralph Loop pattern.

A workflow I’ve been using at work

I still use SpDD a lot at work. For more complex tasks, I’ve been running a two-agent workflow: one is the reviewer, the other implements using a simpler, faster model.

The implementation plan is divided into phases. Each phase has specific automated testing instructions and human validation scripts. The worker implements, and then me and the reviewer discuss the implementation, whether to pivot, and validate the changes. All changes and plan updates are saved to a changelog file. The worker clears its context every phase. The reviewer holds the strategy and goals throughout.

I think this works well for a few reasons:

  1. It fixes the problem of hallucination during implementation without losing track of the original goals
  2. Pivoting is welcome — the plan can mutate and the changelog captures everything
  3. The worker being more lightweight and focused makes implementation faster
  4. Human validation at each step gives more confidence in the process, and makes it easier to catch the need to pivot early — validating or not the changes before moving on
  5. Phased deliveries are also much easier to review than a big chunk with many features at once

For simpler, more closed-scope tasks I wouldn’t recommend it due to the added overhead. And I’m still doing all of this manually, which gets confusing when working on multiple tasks at the same time — which I’m pushing myself to do more and more. I’d like to automate this eventually.

Anyway, that’s it. Hope to be back next week.

This post is licensed under CC BY 4.0 by the author.