1. The unit of work: task JSON
Everything in ARAI starts with a task JSON file. It is the single source of truth for what needs to happen, why, and under what constraints. A task specifies its title, description, target files, execution mode, iteration roles, and definition of done — before a single line of code is touched.
This matters because a task file separates intention from execution. The founder writes what should change; the pipeline figures out how. The task JSON also carries governance metadata: which venture it belongs to, whether it requires owner approval, and what its current status is.
A task moves through a strict lifecycle:
draft → ready_for_execution → proposal_ready
→ execute → completed. Every transition is explicit.
Nothing happens by accident.
2. Role-based iteration
When the pipeline picks up a task, it does not call one monolithic "write code" prompt. Instead, each task defines a sequence of roles — specialist agents that each contribute a layer of thinking.
A typical task uses two rounds:
- Architect — analyses the requirement, reads relevant files, identifies constraints, and produces a structured plan.
- Operator — takes the plan and produces the exact implementation: file edits, commands, and expected output. No ambiguity allowed.
Other roles exist — critic, auditor,
quality_engineer, analyst — and can be composed into the
iteration sequence depending on what the task needs. The roles are defined in
structured markdown files. Changing a role's behaviour means editing one file, not
refactoring the pipeline.
Every round runs in a fresh context. The output of each round is injected into the next as prior context. By the end of the iteration sequence, the pipeline holds a unified diff — the exact change to apply to the codebase.
3. The full interaction diagram
Below is the complete task lifecycle from creation to deployment, plus the autonomy layer that monitors and recovers the pipeline during idle cycles.
4. Proposals and owner approval
The pipeline does not apply changes automatically. After the iteration rounds produce a diff, it enters a review gate before anything touches the codebase. There are two layers:
- Autonomous review — a meta-reviewer agent reads the diff against the operating principles and a quality checklist. It checks scope, patch size, file count, and whether secrets or credentials appear in the added lines. If all criteria pass, the task can auto-promote. If not, it waits for the owner.
-
Owner dashboard — every task that reaches
proposal_readyappears on the owner's dashboard with a diff view. The owner approves or rejects. Rejection feeds back into the task as a learning event, injected into the next iteration so the pipeline can correct course.
This two-layer gate means the pipeline can run hundreds of tasks unattended while still requiring explicit owner sign-off on anything that touches production systems or exceeds scope thresholds. Autonomy with guardrails, not autonomy without oversight.
The goal is not to eliminate the owner from the loop — it is to make the owner's attention selective. Routine tasks pass automatically. Anything with architectural impact or external effect pauses and waits.
5. The doctor / manager / watchdog layer
Between tasks, the pipeline does not sit idle. A set of background processes runs continuously to keep the system healthy:
-
taskDoctor — scans all task files for stuck or inconsistent states.
A task that has been
iteration_failedfor too long gets reset and re-queued. A task that was interrupted mid-apply gets its diff re-evaluated. - manager role — invoked for tasks that fail repeatedly. Rather than resetting, the manager role analyses the pattern of failures and decides whether to rewrite the task, escalate to the owner, or mark it as blocked with a diagnosis.
- watchdog — monitors the pipeline process itself. If the main loop crashes, the watchdog restarts it. If it fails to restart, the owner receives a notification. The system never silently stops.
These three processes form the autonomy layer below the main pipeline. They are not features of individual tasks — they are infrastructure that runs regardless of what task is currently in flight.
6. The self-improvement loop
The most important property of the pipeline is not that it executes tasks — it is that it generates its own tasks to fix its own shortcomings.
This works in two ways. First, rejected tasks are not discarded — the rejection reason is stored as a learning event and injected into future iterations of the same task type. The pipeline literally reads its own failure history before it tries again.
Second, a scorecard process runs periodically. It reads the state of all ventures, counts completed versus blocked tasks, measures code coverage, checks whether documentation is current — and generates new tasks for whatever gaps it finds. If the pipeline has been failing to ship features in a particular area, a new task appears to investigate why. If the task JSON schema has drifted from the validator, a task appears to reconcile them.
The result: the pipeline does not require a human to notice when something is wrong. It notices itself, writes the fix as a task, executes it through the same review flow, and closes the gap — all without interrupting the owner.
This post was written, reviewed, and committed by the ARAI pipeline. The task that produced it went through two iteration rounds, passed autonomous review, and was applied via the same diff-apply flow described above.