My AI Journey: A Timeline of Tinkering, Models, and Machines
My AI Journey: A Timeline of Tinkering, Models, and Machines
TL;DR: I spent 20 years in investment banking risk roles, but I never stopped coding on the side. I started building AI agents in April 2023, long before the current hype cycle. I built my own coding agent with a custom DSL, self-hosted LLMs on a 4090+3090 rig, and created voice and browser agents. I left banking in mid-2025 to build a startup leveraging the "AI agent multiplier." My thesis then, which is becoming obvious now, is that what’s happening in software engineering is coming for every white-collar role.

I spent 20 years in investment banking risk roles. However, in my spare time during weekends and evenings, I was building.
A lot of what's becoming mainstream now in AI agents is something I was using much earlier. It wasn't always pretty. The UI/UX was terrible and I was the only user, but the capabilities were there.
Before GPT: The Foundations
I studied Computing at Imperial College London, and the early part of my career was developing software in the financial industry during the dotcom era. What happened next is what happens to a lot of engineers who move into senior corporate roles. For the last 15 to 20 years, I've been in risk management roles in investment banks, positions where you're managing teams and frameworks, not writing code day-to-day. But I never stopped. On evenings and weekends, I kept tinkering, building side projects, entering hackathons, getting involved in startups, and keeping up with the technology.
I went deep into Haskell. It was a natural fit for someone with a quant and CS background. That led me into the Nix ecosystem, as any good Haskeller would. I drank the pure functional cool-aid and never quite put the glass down. I still use Nix today, even though I barely use Haskell anymore. The languages have shifted to Python and JavaScript, entirely because of the AI ecosystem, but the Nix infrastructure remains.
I built an algorithmic trading system in Haskell using functional reactive programming during the Bitcoin and crypto hype. I was also involved in startups and tried to build a FinTech product when I worked in Copenhagen. That introduced me to Flutter for mobile development. This turned out to be remarkably pertinent for my current apps.
My home setup is admittedly esoteric. My main server runs WSL2 and Nix. I'm a Windows guy because the corporate job demands it. Excel is non-negotiable for anyone in finance, and anyone who deals with trading platforms (Interactive Brokers, NinjaTrader) needs a Windows machine. So my world is Windows on the outside, Nix on the inside. I also have a Mac Mini, essential for iOS development, and I'm slowly moving more towards the Apple ecosystem, as all AI tinkerers seem to be doing these days.
The ChatGPT Moment
Then GPT-3.5 landed, and ChatGPT showed the world something most people hadn't considered: AIs can understand language and speak back coherently. For most people, this was the awakening. For me, it was confirmation that the trajectory I'd been watching was accelerating faster than anyone expected.
But the real spark came shortly after.
March/April 2023: The Awakening: AutoGPT and BabyAGI
Around the release of GPT-4, the open-source community produced something that grabbed my attention far more than chatbots.
AutoGPT hit 100,000 GitHub stars in weeks. BabyAGI showed that an AI could plan and execute multi-step tasks autonomously. These were among the first ReAct agents, precursors of the coding agents and tools we have now. The concept was correct, but the model capabilities weren't there yet. GPT-4 was the best available, but it cost an arm and a leg in those days.
I started tinkering with AutoGPT, burning tokens on exercises that often went nowhere. Infinite loops, hallucinated plans, wasted API credits. But with sufficient guardrails and enough tokens burnt, you could get productive output, particularly in coding.
I forked AutoGPT and started tinkering with it directly. I modified prompts, added guardrails, and tried to make it actually useful rather than just impressive.
Key models: GPT-4 (March 2023). Expensive. Dominant. Nothing else came close.
July 2023: The Birth of AI Assistant
Within months, I wasn't just running other people's agents. I was building my own.
In July 2023, ai_assistant was born. For context: Claude Code didn't exist. Cursor was just getting started. Codex was an API, not a CLI. There was no "AI coding agent" category yet. I was building something that wouldn't have a name for another year.
At first it was a hodgepodge of all my AI ReAct-based tooling: a documentation summariser, a web browser, and a code generator. But it grew fast.
Most of the code was eventually abstracted into separate repositories:
llm_utils: my custom LiteLLM-like library to abstract across providersshin-web-agent: an autonomous web browser agent, inspired by projects like BrowserUse
This was my height of reading the seminal prompt engineering papers of the time. The AI assistant had implementations of the key techniques: Tree of Thoughts, Chain of Thought, ReAct. I also built RAG querying using ChromaDB, adding a planning module inspired by BabyAGI.
Early on, the assistant also had its own context management system. This involved commands to add files, URLs, and goals to the working context, essentially manual context engineering before the term existed. It could also generate git commit messages from diffs and automate GitHub issue-to-PR workflows in a single command.
RAG was the big paradigm at the time. Retrieval-Augmented Generation was everywhere in mid-2023, and for good reason. I had a clear use case: querying large regulatory documents at home to facilitate my work research. Cross-jurisdictional analysis on things like the CRR (Capital Requirements Regulation) involved massive documents, dense legal language, and cross-referencing requirements across jurisdictions. These were genuinely challenging RAG problems. Today the conversation has moved to agentic RAG, but back then, getting basic retrieval to work reliably on financial regulation was hard enough.
Key models: GPT-4 dominant, Claude 2 just launched from Anthropic. Early function calling. Everyone building LangChain wrappers.
Summer 2023: Self-Hosted LLMs
The self-hosting era began that same summer. I bought a new machine and two GPUs, a 4090 and a 3090, and started running models locally.

I used ExLlamaV2 as the inference platform and TabbyAPI for an OpenAI-compatible endpoint. I tinkered with every new model that dropped: Llama 2, Mistral 7B, Mixtral 8x7B. I was quantising, optimising, and doing everything I could to squeeze performance out of consumer hardware.
The appeal was obvious: privacy, no API costs, full control, no rate limits. The reality was more nuanced. Local models in mid-to-late 2023 were good enough for simple tasks, but they couldn't handle the complex multi-step reasoning I needed for serious agent work.
I learned a lot about quantization, GGUF formats, and the practical limits of local inference. Eventually I went API-first for production use cases because the quality gap was too large. But I still believe the "bitter lesson" will close that gap.
Key models: Llama 2, Mistral 7B, Mixtral 8x7B. The quantization revolution. The self-hosted AI movement in full swing.
December 2023: The Aider-Inspired Era: AI Coding Gets Real
Inspired by the Aider project, I decided to unify all my hodgepodge of utilities into a coherent system.
As an avid Vim user, I built an ex-based interface. Instead of slash commands, I used colon commands just like any vimmer would. I could run any AI LLM orchestration imperatively from the command line. I also added a Gradio web frontend for when I wanted a visual interface.
All this time, the system was building itself. People might call it vibe coding, but the better term is vibe engineering. This means end-to-end guardrails to ensure built-in quality at all times: testing, linting, build validation, and CI pipelines. The AI wrote the code, but the engineering discipline was mine.
The system also introduced escalating bug-fixing strategies. I used a simple fix agent for straightforward failures and a mixture-of-agents approach for tricky bugs that used multiple models to reach consensus on the fix. Interactive and automated code review workflows rounded it out.
Key models: GPT-4 Turbo, early Claude 3 previews. Models getting cheaper and faster. Aider proving the concept of LLM-driven code editing.
2024: Multi-Modality Changes Everything
GPT-4o and the Gemini series brought multi-modality. This became a big deal, particularly from November 2024 onward as costs dropped.
Voice Agent for My Son: Using GPT-4o and LiveKit, I created a voice agent. It was a personal AI voice friend for my son. I built a complete mobile app as a POC, and it was pure joy to build.
Voice TODO App: I also built a voice-based TODO app to manage my own task list by speaking to it.
Web Browser Agent: In late 2024, the Gemini 2 series of models made multi-modal inference more cost-effective. I built a Playwright-based browser agent that used the accessibility tree, screenshots, and naive Monte Carlo tree search for web navigation. Most of my time was spent trying to implement stealth techniques to avoid bot detection.
Key models: GPT-4o (multi-modal breakthrough), Gemini 2.0 Flash (cost-effective multi-modal). The era of seeing, hearing, and speaking.
January to April 2025: Reasoning Models and the DSL Revolution
The O1/O3 reasoning models changed what was possible in complex planning and code generation.
In January 2025, I started building a full DSL for the AI assistant, just as any Haskeller would. This allowed me to craft precise coding workflows: use an expensive model (like O1) to plan and architect, then have cheaper models execute the plan. Central agent plans, satellite agents execute.
Here's a small snippet of the DSL. These are some of the commands from init.lm, the entrypoint that wired everything together. The full init.lm is available as a gist if you want to see the complete picture:
# Feature development: from small edits to large multi-file features
user_command "ai_quick_edit" "llm_scripts/workflows/ai_quick_edit.lm"
user_command "ai_add_feature" "llm_scripts/workflows/ai_add_feature.lm"
user_command "ai_add_large_feature" "llm_scripts/workflows/ai_add_large_feature.lm"
# Bug fixing: escalating levels of sophistication
user_command "ai_fix_test" "llm_scripts/agents/smart_fix_test_agent.lm"
user_command "ai_fix_error" "llm_scripts/workflows/fix_error.lm"
user_command "ai_hard_test_fix" "llm_scripts/workflows/hard_test_fix.lm"
# Code review: interactive or automated
user_command "ai_interactive_review" "llm_scripts/workflows/interactive_code_review.lm"
user_command "ai_code_review" "llm_scripts/agents/code_review_agent.lm"
# GitHub integration: issue to PR in one command
user_command "ai_gh_issue_pr" "llm_scripts/workflows/ai_gh_issue_pr.lm"
# Research, memory, and context management
user_command "ai_research" "llm_scripts/workflows/ai_research_topic.lm"
user_command "create_memory" "llm_scripts/create_memory.lm"
user_command "extract_knowledge" "llm_scripts/extract_knowledge.lm"
# Mixture of Agents: consensus-driven problem solving
user_command "ai_moa" "llm_scripts/agents/mixture_of_agents.lm"
Every command mapped to a workflow script. Every workflow had guardrails, context management, and model selection built in. This was Claude Code before Claude Code, but written in a custom DSL instead of TypeScript.
Around this time, I migrated to just running code reviews and nothing else. I had stopped coding directly. I would brainstorm with the AI, review its plans, approve the diffs, and decide whether to continue or address issues. The AI assistant continued to evolve. It now supported autopilot mode for applying series of code changes without approval and added support for new models as they came out.
The assistant transitioned to multi-agent orchestration using tmux and git worktrees. This allowed multiple agents to work on different branches simultaneously, coordinated from a single terminal. It also gained a persistent memory system, the ability to save and recall context across sessions, and task continuity. I could pause work, come back the next day, and pick up exactly where I left off.
Gemini 2.5 Pro was the standout model of this period, particularly for its ability to consume large chunks of a codebase and produce coherent plans.
Key models: O1/O3 (reasoning leap), Gemini 2.5 Pro (massive context + planning), Claude 3.5 Sonnet (coding quality breakthrough).
Mid-2025: The Leap
Big changes were happening at my corporate job at the same time as all this tinkering. I decided I could give it a go and build something of my own. By the time I left in mid-2025, the 'AI agent multiplier' had become obvious. I saw that one person, with the right orchestration, could operate at a completely different scale.
In June 2025, I went to the AI Engineer World's Fair in San Francisco. My eyes were opened. It wasn't just the technology. I'd been deep in that for two years. I saw the ecosystem, the community, and the sheer pace of what was coming.
I left Barclays. My finance career wasn't over. I just needed to be at the forefront of this change. It felt like the dotcom era all over again. Back then, I knew the internet would reshape the world and the future of work. This felt the same, but bigger.
The AI Assistant's Legacy
The ai_assistant is for sure a casualty of the bitter lesson. Commercial tools like Claude Code, Codex, and Cursor have caught up and surpassed what one person could maintain alone. I'm admittedly a little jealous that everyone now has the same capabilities I've had for a while, but that's how progress works. I'll be open-sourcing it soon once I've cleaned it up, so people can see the approach in detail.
However, its precise AI coding orchestration is still unmatched in some ways, I believe. The DSL, the guardrails, and the workflow automation were all designed for my specific way of working. But the gap has closed enough that it's no longer necessary, at least for my current projects.
The real legacy isn't the code. It's the thinking. Every principle that went into the AI assistant, from structured workflows to human checkpoints and quality gates, lives on in what I'm building now. The fact that these ideas are now showing up in mainstream tools tells me the instincts were right, even if the implementation was rough around the edges.
The Bigger Picture
Here's what most people miss: coding was just the first domain. AI agents took hold in software development first because code is easy to verify. You run the tests, and either the build passes or it doesn't. The feedback loop is tight and the rewards are clear.
But the same trajectory is coming for every white-collar discipline. Compliance analysis. Risk reporting. Document review. Financial modelling. Project management. Any structured knowledge work where you can define what "good" looks like is on the same path that coding just went through.
I know this because I'm already living it. Think about what happens in a corporate company: management layers orchestrate the work below them. A CEO sets direction. A COO runs operations. VPs plan delivery. Directors review quality. That's exactly what my AI empire does, except the management layers are agents. A Chief of Staff plans product delivery. An XO runs operations. A Critic reviews quality. Workers execute. I sit at the top, approving the key decisions.
It's not fully there yet. The system is still building itself, continuously improving. But the shape of it is unmistakable. One person, operating like a CEO of a company, with AI agents filling the roles that would normally require a team of 50.
For anyone coming from a corporate background, and I spent 20 years in that world, this is the shift to pay attention to. AI isn't coming for your job tomorrow. Instead, the people who learn to orchestrate agents effectively will operate at a scale that's simply not possible manually. That's not a tech story, it's an operating model.
I'll go deeper on Mission Control and the HITL App in upcoming posts.
What's Next
Agent orchestration is the TODO app of 2026. Everyone will build one. I've already rolled my own, Mission Control, which powers everything I do today. I'll cover the architecture and philosophy in a future post.
I wrote this post because the timeline matters. What's happening in coding right now is a preview of what's coming everywhere else. I've seen the trajectory from the inside. I've gone from the first ReAct agents that could barely hold a conversation to a system that runs an entire product empire. The pattern is the same. Only the domain changes.
This is the first post on my personal blog. If you're interested in what happens when two decades of corporate experience meets the agentic revolution, and what it means for the future of work, stick around. There's a lot more to tell.
Ask about this post
Digital Twin — Live