<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://khalid-taha.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://khalid-taha.github.io/" rel="alternate" type="text/html" /><updated>2025-12-27T10:56:04+00:00</updated><id>https://khalid-taha.github.io/feed.xml</id><title type="html">AI Chronicles</title><subtitle>AI Chronicles is my GitHub blog sharing my journey and experiences in the world of AI.</subtitle><author><name>Khalid Taha</name></author><entry><title type="html">AI-SDLC v1.0 Spec</title><link href="https://khalid-taha.github.io/2025/12/27/AI-SDLC-Spec-v1.html" rel="alternate" type="text/html" title="AI-SDLC v1.0 Spec" /><published>2025-12-27T00:00:00+00:00</published><updated>2025-12-27T00:00:00+00:00</updated><id>https://khalid-taha.github.io/2025/12/27/AI-SDLC-Spec-v1</id><content type="html" xml:base="https://khalid-taha.github.io/2025/12/27/AI-SDLC-Spec-v1.html"><![CDATA[<p><strong>Full Specification (Single Document)</strong></p>

<hr />

<h2 id="1-scope-and-intent">1. Scope and intent</h2>

<p>This document is the <strong>normative specification</strong> for <strong>AI-SDLC v1.0</strong>.</p>

<p>It defines:</p>

<ul>
  <li>what AI-SDLC is,</li>
  <li>what it replaces,</li>
  <li>what artefacts are required,</li>
  <li>what stages exist,</li>
  <li>what rules must be enforced,</li>
  <li>what constitutes progress and completion.</li>
</ul>

<p>This is <strong>not</strong> a blog post, philosophy paper, or marketing document.
It is written to be used as an <strong>operational SDLC spec</strong>.</p>

<hr />

<h2 id="2-definitions">2. Definitions</h2>

<p><strong>AI executor</strong>
A non-human system that generates, modifies, or removes implementation artefacts.</p>

<p><strong>Human owner</strong>
A named individual accountable for intent, constraints, decisions, and outcomes.</p>

<p><strong>Intent</strong>
A concise statement of purpose, outcome, constraints, assumptions, and stop conditions.</p>

<p><strong>Specification</strong>
A machine-readable description of expected behaviour, interfaces, data, non-functional requirements, and observability.</p>

<p><strong>Evidence</strong>
Observed runtime behaviour produced by a deployed system.</p>

<p><strong>Decision</strong>
A recorded human judgement based on evidence that determines the next action.</p>

<hr />

<h2 id="3-core-premise">3. Core premise</h2>

<p>AI-SDLC v1.0 is based on the following premises:</p>

<ol>
  <li>Implementation cost is low.</li>
  <li>Change is continuous and expected.</li>
  <li>Poor decisions dominate failure modes.</li>
  <li>Execution can be automated.</li>
  <li>Accountability cannot be automated.</li>
</ol>

<p>Any process that optimises primarily for human execution speed is out of scope.</p>

<hr />

<h2 id="4-what-ai-sdlc-replaces">4. What AI-SDLC replaces</h2>

<p>AI-SDLC v1.0 explicitly replaces SDLC constructs whose primary purpose is to manage human execution and coordination:</p>

<ul>
  <li>task backlogs as the unit of progress,</li>
  <li>sprint cycles as a reporting mechanism,</li>
  <li>role hand-offs between product, design, and engineering,</li>
  <li>status reporting based on activity,</li>
  <li>delivery milestones detached from runtime behaviour.</li>
</ul>

<p>These constructs may exist locally but <strong>MUST NOT</strong> define progress.</p>

<hr />

<h2 id="5-what-ai-sdlc-retains">5. What AI-SDLC retains</h2>

<p>AI-SDLC v1.0 retains and enforces:</p>

<ul>
  <li>explicit intent and constraints,</li>
  <li>architectural decisions where consequences exist,</li>
  <li>automated quality verification,</li>
  <li>security, privacy, and compliance controls,</li>
  <li>observability and rollback,</li>
  <li>auditability of decisions.</li>
</ul>

<hr />

<h2 id="6-design-principles-normative">6. Design principles (normative)</h2>

<p>Implementations of AI-SDLC <strong>MUST</strong> follow these principles:</p>

<ol>
  <li>
    <p><strong>Intent precedes execution</strong>
No implementation begins without explicit intent.</p>
  </li>
  <li>
    <p><strong>Constraints are enforceable</strong>
Constraints are expressed in a form that automation can block.</p>
  </li>
  <li>
    <p><strong>Implementation is replaceable</strong>
Code is an output, not a long-term asset.</p>
  </li>
  <li>
    <p><strong>Quality gates are automatic</strong>
Human approval cannot bypass enforcement.</p>
  </li>
  <li>
    <p><strong>Evidence drives decisions</strong>
Decisions are based on observed behaviour, not plans.</p>
  </li>
  <li>
    <p><strong>Humans remain accountable</strong>
Every decision has a named owner.</p>
  </li>
</ol>

<hr />

<h2 id="7-required-artefacts">7. Required artefacts</h2>

<p>An AI-SDLC v1.0 system <strong>MUST</strong> maintain the following artefacts.</p>

<h3 id="71-intent-record">7.1 Intent record</h3>

<p><strong>Required fields:</strong></p>

<ul>
  <li>Problem statement</li>
  <li>Affected user or system</li>
  <li>Desired outcome</li>
  <li>Non-negotiable constraints</li>
  <li>Explicit assumptions</li>
  <li>Stop or change conditions</li>
  <li>Named intent owner</li>
</ul>

<p>The intent record <strong>MUST</strong> be concise and versioned.</p>

<hr />

<h3 id="72-formal-specification">7.2 Formal specification</h3>

<p>The specification <strong>MUST</strong> describe:</p>

<ul>
  <li>Behaviour and interfaces</li>
  <li>Inputs and outputs</li>
  <li>Data and state</li>
  <li>Error conditions</li>
  <li>Non-functional requirements</li>
  <li>Acceptance scenarios</li>
  <li>Required observability</li>
</ul>

<p>The specification <strong>MUST</strong> be machine-readable.</p>

<hr />

<h3 id="73-decision-log">7.3 Decision log</h3>

<p>Each decision <strong>MUST</strong> record:</p>

<ul>
  <li>Decision date</li>
  <li>Decision owner</li>
  <li>Evidence reviewed</li>
  <li>Decision taken</li>
  <li>Rationale</li>
  <li>Resulting action</li>
</ul>

<hr />

<h2 id="8-lifecycle-stages">8. Lifecycle stages</h2>

<p>AI-SDLC v1.0 defines a <strong>closed decision loop</strong>.</p>

<h3 id="stage-1-intent-definition">Stage 1: Intent definition</h3>

<p>A human owner defines intent.</p>

<p><strong>Exit condition:</strong>
Intent record exists and is approved by the intent owner.</p>

<hr />

<h3 id="stage-2-specification">Stage 2: Specification</h3>

<p>Intent is translated into a formal specification.</p>

<p><strong>Exit condition:</strong>
Specification is complete, consistent, and machine-readable.</p>

<hr />

<h3 id="stage-3-automated-implementation">Stage 3: Automated implementation</h3>

<p>AI generates:</p>

<ul>
  <li>application code,</li>
  <li>tests,</li>
  <li>infrastructure,</li>
  <li>instrumentation,</li>
  <li>supporting artefacts.</li>
</ul>

<p>Human review is permitted but not required for generation.</p>

<p><strong>Exit condition:</strong>
All required artefacts exist.</p>

<hr />

<h3 id="stage-4-automated-enforcement">Stage 4: Automated enforcement</h3>

<p>The system enforces:</p>

<ul>
  <li>correctness,</li>
  <li>security,</li>
  <li>performance,</li>
  <li>cost limits,</li>
  <li>deployment safety.</li>
</ul>

<p>Failures <strong>MUST</strong> block progression automatically.</p>

<p><strong>Exit condition:</strong>
All enforcement checks pass.</p>

<hr />

<h3 id="stage-5-deployment">Stage 5: Deployment</h3>

<p>Deployment <strong>MUST</strong> be:</p>

<ul>
  <li>incremental,</li>
  <li>reversible,</li>
  <li>observable.</li>
</ul>

<p>Deployment is not completion.</p>

<p><strong>Exit condition:</strong>
System is running and observable.</p>

<hr />

<h3 id="stage-6-observation">Stage 6: Observation</h3>

<p>The system produces evidence including:</p>

<ul>
  <li>behaviour,</li>
  <li>reliability,</li>
  <li>performance,</li>
  <li>cost,</li>
  <li>failure modes.</li>
</ul>

<p>Evidence <strong>MUST</strong> be collected continuously.</p>

<hr />

<h3 id="stage-7-decision">Stage 7: Decision</h3>

<p>A named decision owner selects one action:</p>

<ul>
  <li>continue,</li>
  <li>adjust,</li>
  <li>stop,</li>
  <li>pivot.</li>
</ul>

<p>The decision <strong>MUST</strong> be recorded.</p>

<p><strong>Exit condition:</strong>
Decision log entry exists.</p>

<p>The loop then repeats.</p>

<hr />

<h2 id="9-unit-of-progress">9. Unit of progress</h2>

<p>The <strong>only recognised unit of progress</strong> in AI-SDLC v1.0 is a <strong>decision informed by evidence</strong>.</p>

<p>Task completion, feature delivery, or code volume <strong>MUST NOT</strong> be treated as progress.</p>

<hr />

<h2 id="10-roles-and-accountability">10. Roles and accountability</h2>

<p>AI-SDLC v1.0 requires the following functions:</p>

<ul>
  <li><strong>Intent owner</strong></li>
  <li><strong>Decision owner</strong></li>
  <li><strong>System steward</strong></li>
  <li><strong>AI executor</strong></li>
</ul>

<p>One person may perform multiple functions.
Every function <strong>MUST</strong> have a named human owner.</p>

<hr />

<h2 id="11-metrics-and-measurement">11. Metrics and measurement</h2>

<p>AI-SDLC v1.0 <strong>SHOULD</strong> measure:</p>

<ul>
  <li>time from intent to evidence,</li>
  <li>number of assumptions tested,</li>
  <li>decision latency,</li>
  <li>recovery time from failure,</li>
  <li>cost of change.</li>
</ul>

<p>It <strong>MUST NOT</strong> optimise for activity metrics.</p>

<hr />

<h2 id="12-failure-modes-non-compliance">12. Failure modes (non-compliance)</h2>

<p>An implementation is <strong>non-compliant</strong> if:</p>

<ul>
  <li>code is built without intent,</li>
  <li>constraints are documented but not enforced,</li>
  <li>deployment occurs without observability,</li>
  <li>decisions lack evidence,</li>
  <li>accountability is unclear.</li>
</ul>

<hr />

<h2 id="13-versioning-and-evolution">13. Versioning and evolution</h2>

<p>AI-SDLC versions <strong>MUST</strong> be:</p>

<ul>
  <li>explicitly numbered,</li>
  <li>backwards aware,</li>
  <li>revised only with documented rationale.</li>
</ul>

<p>AI-SDLC v1.0 is intentionally minimal.</p>

<hr />

<h2 id="14-summary-definition">14. Summary definition</h2>

<p><strong>AI-SDLC v1.0 is a decision-driven software development lifecycle in which humans define intent and constraints, AI performs implementation, automated systems enforce quality and safety, and runtime evidence determines what happens next.</strong></p>

<hr />

<h2 id="15-entry-and-exit-criteria">15. Entry and Exit Criteria</h2>

<p>This section defines the <strong>mandatory conditions</strong> under which work may enter the AI-SDLC loop, progress between stages, and exit a loop iteration.</p>

<p>These criteria are <strong>normative</strong> and enforceable.</p>

<h3 id="151-entry-criteria-start-of-work">15.1 Entry criteria (start of work)</h3>

<p>A change, feature, experiment, or system <strong>MUST NOT</strong> enter the AI-SDLC lifecycle unless <strong>all</strong> of the following conditions are met:</p>

<ol>
  <li>An <strong>intent record exists</strong> in <code>intent/intent.md</code>.</li>
  <li>
    <p>The intent record includes:</p>

    <ul>
      <li>a clear problem statement,</li>
      <li>a desired outcome,</li>
      <li>non-negotiable constraints,</li>
      <li>explicit assumptions,</li>
      <li>stop or change conditions,</li>
      <li>a named intent owner.</li>
    </ul>
  </li>
  <li>The intent owner has explicitly approved the intent record.</li>
</ol>

<p>If any condition is not met, <strong>no implementation work is permitted</strong>, including AI-generated work.</p>

<h3 id="152-entry-criteria-for-specification">15.2 Entry criteria for specification</h3>

<p>A system <strong>MUST NOT</strong> move from intent definition to specification unless:</p>

<ol>
  <li>The intent record is complete and internally consistent.</li>
  <li>All assumptions are explicitly documented in <code>intent/assumptions.md</code>.</li>
  <li>Constraints are stated in a form that can be enforced by automation.</li>
</ol>

<p>If constraints cannot be enforced, the intent <strong>MUST</strong> be revised before proceeding.</p>

<h3 id="153-entry-criteria-for-automated-implementation">15.3 Entry criteria for automated implementation</h3>

<p>Automated implementation <strong>MUST NOT</strong> begin unless:</p>

<ol>
  <li>A formal specification exists in <code>spec/</code>.</li>
  <li>All required spec files are present, even if minimal.</li>
  <li>Acceptance scenarios are defined.</li>
  <li>Required observability is specified.</li>
</ol>

<p>Speculative or exploratory implementation without a specification is <strong>non-compliant</strong>.</p>

<h3 id="154-entry-criteria-for-deployment">15.4 Entry criteria for deployment</h3>

<p>A system <strong>MUST NOT</strong> be deployed unless:</p>

<ol>
  <li>All automated enforcement gates pass.</li>
  <li>Observability is implemented as specified.</li>
  <li>Rollback mechanisms are available and tested.</li>
  <li>A decision owner is assigned for post-deployment review.</li>
</ol>

<p>Deployment without observability is <strong>explicitly forbidden</strong>.</p>

<h3 id="155-exit-criteria-completion-of-a-loop-iteration">15.5 Exit criteria (completion of a loop iteration)</h3>

<p>A single AI-SDLC loop iteration is considered <strong>complete</strong> only when:</p>

<ol>
  <li>The system has been deployed and observed, <strong>or</strong></li>
  <li>A conscious decision has been made not to deploy, <strong>and</strong></li>
  <li>
    <p>A decision record exists in <code>decisions/</code> documenting:</p>

    <ul>
      <li>the evidence reviewed,</li>
      <li>the decision taken,</li>
      <li>the rationale.</li>
    </ul>
  </li>
</ol>

<p>Without a recorded decision, <strong>no progress has occurred</strong>, regardless of implementation activity.</p>

<h3 id="156-stop-conditions-mandatory">15.6 Stop conditions (mandatory)</h3>

<p>Every intent record <strong>MUST</strong> define stop or change conditions.</p>

<p>Work <strong>MUST STOP immediately</strong> when any stop condition is met.</p>

<p>Examples include, but are not limited to:</p>

<ul>
  <li>evidence contradicts a core assumption,</li>
  <li>constraints are violated beyond acceptable limits,</li>
  <li>cost exceeds defined budgets,</li>
  <li>risk exceeds acceptable thresholds.</li>
</ul>

<p>Stopping is treated as a <strong>successful outcome</strong> when driven by evidence.</p>

<h3 id="157-prohibited-states">15.7 Prohibited states</h3>

<p>The following states are <strong>explicitly prohibited</strong> under AI-SDLC v1.0:</p>

<ul>
  <li>implementation without intent,</li>
  <li>deployment without observability,</li>
  <li>iteration without decisions,</li>
  <li>continued work after stop conditions are met,</li>
  <li>unowned systems with no decision owner.</li>
</ul>

<p>Any occurrence places the system in <strong>non-compliant status</strong>.</p>

<h3 id="158-summary">15.8 Summary</h3>

<p>AI-SDLC v1.0 enforces disciplined flow by defining when work may begin, proceed, and stop.</p>

<p>Speed is permitted.
Speculation is not.
Progress requires decisions.</p>

<hr />

<h2 id="16-compliance-levels-and-exceptions">16. Compliance Levels and Exceptions</h2>

<p>This section defines how compliance with AI-SDLC v1.0 is assessed, how exceptions are handled, and how non-compliance is treated.</p>

<p>Compliance is <strong>explicit</strong>, <strong>auditable</strong>, and <strong>decision-owned</strong>.</p>

<h3 id="161-compliance-levels">16.1 Compliance levels</h3>

<p>Every system operating under AI-SDLC v1.0 <strong>MUST</strong> be in exactly one of the following compliance states at any time.</p>

<h4 id="1611-fully-compliant">16.1.1 Fully compliant</h4>

<p>A system is <strong>fully compliant</strong> when:</p>

<ul>
  <li>all mandatory sections of the AI-SDLC v1.0 specification are satisfied,</li>
  <li>all required artefacts exist and are up to date,</li>
  <li>all enforcement gates are active and passing,</li>
  <li>all decisions are recorded and owned.</li>
</ul>

<p>Fully compliant systems may proceed through the lifecycle without restriction.</p>

<h4 id="1612-conditionally-compliant">16.1.2 Conditionally compliant</h4>

<p>A system is <strong>conditionally compliant</strong> when:</p>

<ul>
  <li>one or more AI-SDLC requirements are intentionally unmet,</li>
  <li>the deviation is explicitly documented,</li>
  <li>a decision owner has approved the deviation,</li>
  <li>the deviation is time-bounded or scope-bounded.</li>
</ul>

<p>Conditional compliance <strong>MUST</strong> be recorded as a decision in <code>decisions/</code>.</p>

<p>Conditional compliance is an exception, not a default state.</p>

<h4 id="1613-non-compliant">16.1.3 Non-compliant</h4>

<p>A system is <strong>non-compliant</strong> when:</p>

<ul>
  <li>mandatory requirements are unmet without an approved exception,</li>
  <li>enforcement gates are bypassed,</li>
  <li>decisions lack evidence or ownership,</li>
  <li>prohibited states defined in Section 15.7 occur.</li>
</ul>

<p>Non-compliant systems <strong>MUST NOT</strong> be deployed or progressed.</p>

<h3 id="162-exception-handling">16.2 Exception handling</h3>

<p>Exceptions are permitted only under controlled conditions.</p>

<p>An exception <strong>MUST</strong>:</p>

<ol>
  <li>be explicitly requested,</li>
  <li>be reviewed by a named decision owner,</li>
  <li>be approved or rejected explicitly,</li>
  <li>be recorded in the decision log,</li>
  <li>define scope, duration, and review conditions.</li>
</ol>

<p>Silent or implicit exceptions are <strong>forbidden</strong>.</p>

<h3 id="163-exception-record-requirements">16.3 Exception record requirements</h3>

<p>An exception decision record <strong>MUST</strong> include:</p>

<ul>
  <li>the requirement being excepted,</li>
  <li>the reason for the exception,</li>
  <li>the risk introduced,</li>
  <li>mitigation measures,</li>
  <li>the duration or condition for expiry,</li>
  <li>the decision owner.</li>
</ul>

<p>Exceptions without an expiry condition are <strong>non-compliant</strong>.</p>

<h3 id="164-expiry-and-review-of-exceptions">16.4 Expiry and review of exceptions</h3>

<p>All exceptions <strong>MUST</strong> be reviewed:</p>

<ul>
  <li>at the next decision point, or</li>
  <li>when defined expiry conditions are met, whichever occurs first.</li>
</ul>

<p>Expired exceptions <strong>MUST</strong> either:</p>

<ul>
  <li>be removed, restoring full compliance, or</li>
  <li>be explicitly renewed via a new decision.</li>
</ul>

<p>Automatic or indefinite rollover of exceptions is <strong>not permitted</strong>.</p>

<h3 id="165-enforcement-of-compliance">16.5 Enforcement of compliance</h3>

<p>Compliance status <strong>MUST</strong> be visible and machine-checkable.</p>

<p>Automation <strong>SHOULD</strong>:</p>

<ul>
  <li>block deployment of non-compliant systems,</li>
  <li>warn on conditional compliance nearing expiry,</li>
  <li>surface compliance status alongside runtime evidence.</li>
</ul>

<p>Human approval <strong>MUST NOT</strong> override automated blocking of non-compliant states.</p>

<h3 id="166-summary">16.6 Summary</h3>

<p>AI-SDLC v1.0 treats compliance as a first-class system property.</p>

<p>Rules may be bent deliberately.
They may not be bent silently.
Accountability is explicit.</p>

<hr />

<h2 id="17-risk-classification">17. Risk Classification</h2>

<p>This section defines how systems are classified by risk under AI-SDLC v1.0 and how risk affects enforcement, review, and decision-making.</p>

<p>Risk classification is <strong>mandatory</strong>.
All systems <strong>MUST</strong> be assigned a risk level before automated implementation begins.</p>

<h3 id="171-purpose-of-risk-classification">17.1 Purpose of risk classification</h3>

<p>Risk classification exists to ensure that:</p>

<ul>
  <li>higher-risk systems receive stronger controls,</li>
  <li>low-risk systems are not burdened by unnecessary process,</li>
  <li>enforcement scales with potential impact.</li>
</ul>

<p>Risk is assessed based on <strong>impact</strong>, not effort.</p>

<h3 id="172-risk-levels">17.2 Risk levels</h3>

<p>AI-SDLC v1.0 defines three risk levels.</p>

<p>Each system <strong>MUST</strong> be classified into exactly one level.</p>

<h4 id="1721-low-risk">17.2.1 Low risk</h4>

<p>A system is <strong>low risk</strong> if failure would result in:</p>

<ul>
  <li>no material user harm,</li>
  <li>no regulatory or legal impact,</li>
  <li>no financial loss beyond defined tolerance,</li>
  <li>no exposure of sensitive data.</li>
</ul>

<p>Examples include:</p>

<ul>
  <li>internal tools,</li>
  <li>prototypes,</li>
  <li>experiments with limited scope.</li>
</ul>

<p>Low-risk systems may operate with minimal enforcement, provided all core AI-SDLC rules are satisfied.</p>

<h4 id="1722-medium-risk">17.2.2 Medium risk</h4>

<p>A system is <strong>medium risk</strong> if failure could result in:</p>

<ul>
  <li>user-facing disruption,</li>
  <li>moderate financial impact,</li>
  <li>operational instability,</li>
  <li>handling of non-sensitive customer data.</li>
</ul>

<p>Examples include:</p>

<ul>
  <li>customer-facing applications,</li>
  <li>internal systems supporting revenue,</li>
  <li>systems with availability or performance commitments.</li>
</ul>

<p>Medium-risk systems require full enforcement of AI-SDLC artefacts and gates.</p>

<h4 id="1723-high-risk">17.2.3 High risk</h4>

<p>A system is <strong>high risk</strong> if failure could result in:</p>

<ul>
  <li>significant financial loss,</li>
  <li>regulatory or legal exposure,</li>
  <li>safety or security incidents,</li>
  <li>handling of sensitive or regulated data.</li>
</ul>

<p>Examples include:</p>

<ul>
  <li>financial systems,</li>
  <li>regulated platforms,</li>
  <li>security-critical infrastructure.</li>
</ul>

<p>High-risk systems require enhanced controls and stricter decision review.</p>

<h3 id="173-risk-declaration-requirements">17.3 Risk declaration requirements</h3>

<p>Risk classification <strong>MUST</strong> be declared in the intent record.</p>

<p>The declaration <strong>MUST</strong> include:</p>

<ul>
  <li>assigned risk level,</li>
  <li>justification for the classification,</li>
  <li>named decision owner responsible for the classification.</li>
</ul>

<p>Risk classification <strong>MUST</strong> be reviewed whenever system scope or impact changes.</p>

<h3 id="174-impact-on-enforcement">17.4 Impact on enforcement</h3>

<p>Risk level directly affects enforcement.</p>

<p>At minimum:</p>

<ul>
  <li>
    <p><strong>Low risk</strong></p>

    <ul>
      <li>Standard AI-SDLC lifecycle</li>
      <li>Minimal review overhead</li>
    </ul>
  </li>
  <li>
    <p><strong>Medium risk</strong></p>

    <ul>
      <li>Full enforcement of all AI-SDLC requirements</li>
      <li>Mandatory observability and rollback</li>
    </ul>
  </li>
  <li>
    <p><strong>High risk</strong></p>

    <ul>
      <li>
        <p>Full enforcement plus:</p>

        <ul>
          <li>stricter acceptance criteria,</li>
          <li>enhanced observability,</li>
          <li>explicit decision review before deployment,</li>
          <li>tighter exception controls.</li>
        </ul>
      </li>
    </ul>
  </li>
</ul>

<p>Risk level <strong>MUST NOT</strong> be used to bypass core AI-SDLC principles.</p>

<h3 id="175-misclassification">17.5 Misclassification</h3>

<p>Intentional or negligent misclassification of risk is treated as <strong>non-compliance</strong>.</p>

<p>If observed impact exceeds declared risk level:</p>

<ul>
  <li>work <strong>MUST</strong> pause,</li>
  <li>risk classification <strong>MUST</strong> be reassessed,</li>
  <li>a decision record <strong>MUST</strong> document corrective action.</li>
</ul>

<h3 id="176-summary">17.6 Summary</h3>

<p>AI-SDLC v1.0 scales discipline with impact.</p>

<p>Low-risk systems move fast.
High-risk systems move carefully.
All systems remain accountable.</p>

<hr />

<h2 id="18-non-goals">18. Non-Goals</h2>

<p>This section defines what <strong>AI-SDLC v1.0 explicitly does not attempt to define or solve</strong>.</p>

<p>These non-goals are intentional.
They protect the specification from scope creep, misinterpretation, and misuse.</p>

<h3 id="181-team-structure-and-roles">18.1 Team structure and roles</h3>

<p>AI-SDLC v1.0 does <strong>not</strong> define:</p>

<ul>
  <li>team sizes or compositions,</li>
  <li>job titles or reporting lines,</li>
  <li>organisational design or management structure.</li>
</ul>

<p>The SDLC defines <strong>accountability and function</strong>, not organisational charts.</p>

<h3 id="182-tooling-and-vendor-selection">18.2 Tooling and vendor selection</h3>

<p>AI-SDLC v1.0 does <strong>not</strong> mandate:</p>

<ul>
  <li>specific AI models or providers,</li>
  <li>programming languages or frameworks,</li>
  <li>CI/CD platforms,</li>
  <li>observability tools,</li>
  <li>infrastructure vendors.</li>
</ul>

<p>Tool choice is an implementation concern and is intentionally left open.</p>

<h3 id="183-business-governance-and-approval-processes">18.3 Business governance and approval processes</h3>

<p>AI-SDLC v1.0 does <strong>not</strong> replace:</p>

<ul>
  <li>business case approval,</li>
  <li>budget approval,</li>
  <li>legal or regulatory sign-off,</li>
  <li>executive governance processes.</li>
</ul>

<p>It assumes these exist externally and integrates with them via intent, constraints, and decisions.</p>

<h3 id="184-human-performance-management">18.4 Human performance management</h3>

<p>AI-SDLC v1.0 does <strong>not</strong>:</p>

<ul>
  <li>measure individual productivity,</li>
  <li>evaluate human performance,</li>
  <li>define incentives or compensation,</li>
  <li>optimise for utilisation metrics.</li>
</ul>

<p>The SDLC governs systems and decisions, not people management.</p>

<h3 id="185-velocity-optimisation">18.5 Velocity optimisation</h3>

<p>AI-SDLC v1.0 does <strong>not</strong> optimise for:</p>

<ul>
  <li>speed of delivery as an end in itself,</li>
  <li>volume of output,</li>
  <li>number of features shipped.</li>
</ul>

<p>Speed is a byproduct of clarity and automation, not a primary goal.</p>

<h3 id="186-ai-autonomy-beyond-execution">18.6 AI autonomy beyond execution</h3>

<p>AI-SDLC v1.0 does <strong>not</strong> grant AI authority to:</p>

<ul>
  <li>define intent,</li>
  <li>change constraints,</li>
  <li>approve exceptions,</li>
  <li>make final decisions,</li>
  <li>assume accountability.</li>
</ul>

<p>AI executes. Humans decide.</p>

<h3 id="187-universal-applicability">18.7 Universal applicability</h3>

<p>AI-SDLC v1.0 does <strong>not</strong> claim to be suitable for:</p>

<ul>
  <li>every organisation,</li>
  <li>every regulatory environment,</li>
  <li>every system type.</li>
</ul>

<p>Adoption requires judgement and may require adaptation beyond v1.0.</p>

<h3 id="188-summary">18.8 Summary</h3>

<p>AI-SDLC v1.0 is deliberately narrow.</p>

<p>It defines <strong>how systems are built and evolved</strong> when AI performs execution and humans retain responsibility.</p>

<p>Anything outside that boundary is explicitly out of scope.</p>

<hr />

<h1 id="appendix-a-file-and-folder-standard">Appendix A: File and Folder Standard</h1>

<h2 id="a1-purpose">A.1 Purpose</h2>

<p>This appendix defines the <strong>mandatory file and folder structure</strong> for systems operating under <strong>AI-SDLC v1.0</strong>.</p>

<p>The structure is designed to:</p>

<ul>
  <li>make intent, decisions, and constraints first-class,</li>
  <li>separate human judgement from AI execution,</li>
  <li>support automation and enforcement,</li>
  <li>ensure auditability and traceability.</li>
</ul>

<p>This is a <strong>normative standard</strong>, not a suggestion.</p>

<hr />

<h2 id="a2-root-structure-canonical">A.2 Root structure (canonical)</h2>

<p>Every AI-SDLC–compliant repository <strong>MUST</strong> follow this structure:</p>

<pre><code class="language-text">/
├─ intent/
├─ spec/
├─ decisions/
├─ src/
├─ tests/
├─ infra/
├─ observability/
├─ compliance/
├─ runbooks/
├─ ai/
└─ README.md
</code></pre>

<p>No directory may be omitted unless explicitly stated.</p>

<hr />

<h2 id="a3-intent--human-purpose-and-constraints">A.3 intent/ — human purpose and constraints</h2>

<h3 id="purpose">Purpose</h3>

<p>Holds <strong>human-authored intent records</strong>.</p>

<h3 id="rules">Rules</h3>

<ul>
  <li>Written by humans only.</li>
  <li>Small, explicit, and versioned.</li>
  <li>No implementation detail.</li>
</ul>

<h3 id="required-files">Required files</h3>

<pre><code class="language-text">intent/
├─ intent.md
└─ assumptions.md
</code></pre>

<h4 id="intentintentmd-required">intent/intent.md (required)</h4>

<p>Must include:</p>

<ul>
  <li>Problem statement</li>
  <li>Affected user or system</li>
  <li>Desired outcome</li>
  <li>Non-negotiable constraints</li>
  <li>Stop / change conditions</li>
  <li>Intent owner</li>
</ul>

<h4 id="intentassumptionsmd-required">intent/assumptions.md (required)</h4>

<ul>
  <li>Explicit list of assumptions</li>
  <li>Each assumption must be testable</li>
  <li>Each assumption must link to evidence later</li>
</ul>

<hr />

<h2 id="a4-spec--executable-system-definition">A.4 spec/ — executable system definition</h2>

<h3 id="purpose-1">Purpose</h3>

<p>Holds <strong>machine-readable specifications</strong> used by AI executors.</p>

<h3 id="rules-1">Rules</h3>

<ul>
  <li>Source of truth for behaviour.</li>
  <li>No narrative prose.</li>
  <li>Must be consumable by automation.</li>
</ul>

<h3 id="required-structure">Required structure</h3>

<pre><code class="language-text">spec/
├─ system.yaml
├─ interfaces.yaml
├─ data.yaml
├─ nfr.yaml
├─ acceptance.yaml
└─ observability.yaml
</code></pre>

<p>Each file <strong>MUST</strong> exist, even if minimal.</p>

<hr />

<h2 id="a5-decisions--accountability-and-evidence">A.5 decisions/ — accountability and evidence</h2>

<h3 id="purpose-2">Purpose</h3>

<p>Records <strong>human decisions</strong> made using runtime evidence.</p>

<h3 id="rules-2">Rules</h3>

<ul>
  <li>Append-only.</li>
  <li>One decision per file.</li>
  <li>Decisions are immutable once recorded.</li>
</ul>

<h3 id="structure">Structure</h3>

<pre><code class="language-text">decisions/
├─ 0001-initial-scope.md
├─ 0002-adjust-constraint.md
└─ 0003-pivot.md
</code></pre>

<p>Each decision file <strong>MUST</strong> include:</p>

<ul>
  <li>Date</li>
  <li>Decision owner</li>
  <li>Evidence reviewed (links)</li>
  <li>Decision taken</li>
  <li>Rationale</li>
  <li>Resulting action</li>
</ul>

<hr />

<h2 id="a6-src--ai-generated-implementation">A.6 src/ — AI-generated implementation</h2>

<h3 id="purpose-3">Purpose</h3>

<p>Holds <strong>implementation artefacts</strong>.</p>

<h3 id="rules-3">Rules</h3>

<ul>
  <li>Primarily AI-generated.</li>
  <li>Replaceable at any time.</li>
  <li>No business intent defined here.</li>
</ul>

<h3 id="notes">Notes</h3>

<ul>
  <li>Humans may edit, but code is not authoritative.</li>
  <li>Spec and intent override code.</li>
</ul>

<hr />

<h2 id="a7-tests--automated-verification">A.7 tests/ — automated verification</h2>

<h3 id="purpose-4">Purpose</h3>

<p>Holds tests enforcing correctness and constraints.</p>

<h3 id="rules-4">Rules</h3>

<ul>
  <li>Tests are mandatory.</li>
  <li>Generated by AI, reviewed by humans.</li>
  <li>Blocking in CI.</li>
</ul>

<h3 id="typical-contents">Typical contents</h3>

<pre><code class="language-text">tests/
├─ unit/
├─ integration/
├─ contract/
└─ security/
</code></pre>

<hr />

<h2 id="a8-infra--deployment-and-environment">A.8 infra/ — deployment and environment</h2>

<h3 id="purpose-5">Purpose</h3>

<p>Defines infrastructure and deployment behaviour.</p>

<h3 id="rules-5">Rules</h3>

<ul>
  <li>Must support rollback.</li>
  <li>Must support staged deployment.</li>
  <li>Must expose observability hooks.</li>
</ul>

<h3 id="typical-contents-1">Typical contents</h3>

<pre><code class="language-text">infra/
├─ environments/
├─ pipelines/
└─ policies/
</code></pre>

<hr />

<h2 id="a9-observability--evidence-production">A.9 observability/ — evidence production</h2>

<h3 id="purpose-6">Purpose</h3>

<p>Defines how <strong>runtime evidence</strong> is produced and collected.</p>

<h3 id="rules-6">Rules</h3>

<ul>
  <li>Required before deployment.</li>
  <li>Enforced automatically.</li>
</ul>

<h3 id="typical-contents-2">Typical contents</h3>

<pre><code class="language-text">observability/
├─ metrics.yaml
├─ logs.yaml
├─ alerts.yaml
└─ dashboards.yaml
</code></pre>

<hr />

<h2 id="a10-compliance--enforced-constraints">A.10 compliance/ — enforced constraints</h2>

<h3 id="purpose-7">Purpose</h3>

<p>Captures <strong>formal constraints</strong> that automation must enforce.</p>

<h3 id="rules-7">Rules</h3>

<ul>
  <li>Constraints must be machine-checkable.</li>
  <li>Documentation alone is insufficient.</li>
</ul>

<h3 id="typical-contents-3">Typical contents</h3>

<pre><code class="language-text">compliance/
├─ security.md
├─ privacy.md
├─ cost-limits.yaml
└─ performance-budgets.yaml
</code></pre>

<hr />

<h2 id="a11-runbooks--operational-response">A.11 runbooks/ — operational response</h2>

<h3 id="purpose-8">Purpose</h3>

<p>Defines how humans respond to failures.</p>

<h3 id="rules-8">Rules</h3>

<ul>
  <li>Short and explicit.</li>
  <li>No architecture descriptions.</li>
</ul>

<h3 id="typical-contents-4">Typical contents</h3>

<pre><code class="language-text">runbooks/
├─ rollback.md
├─ incident-response.md
└─ escalation.md
</code></pre>

<hr />

<h2 id="a12-ai--ai-execution-configuration">A.12 ai/ — AI execution configuration</h2>

<h3 id="purpose-9">Purpose</h3>

<p>Defines how AI systems operate within the SDLC.</p>

<h3 id="rules-9">Rules</h3>

<ul>
  <li>Controls AI behaviour.</li>
  <li>Never contains intent.</li>
</ul>

<h3 id="typical-contents-5">Typical contents</h3>

<pre><code class="language-text">ai/
├─ prompts/
├─ policies.yaml
├─ guardrails.yaml
└─ memory.md
</code></pre>

<hr />

<h2 id="a13-readmemd--orientation-only">A.13 README.md — orientation only</h2>

<h3 id="purpose-10">Purpose</h3>

<p>Human-readable overview.</p>

<h3 id="rules-10">Rules</h3>

<ul>
  <li>Points to intent, spec, and decisions.</li>
  <li>No duplication of authoritative content.</li>
</ul>

<hr />

<h2 id="a14-authority-order-critical">A.14 Authority order (critical)</h2>

<p>When conflicts exist, authority is resolved in this order:</p>

<ol>
  <li>intent/</li>
  <li>spec/</li>
  <li>decisions/</li>
  <li>compliance/</li>
  <li>observability/</li>
  <li>infra/</li>
  <li>tests/</li>
  <li>src/</li>
</ol>

<p>Code <strong>never overrides intent or decisions</strong>.</p>

<hr />

<h2 id="a15-compliance-rule">A.15 Compliance rule</h2>

<p>A repository is <strong>AI-SDLC v1.0 compliant</strong> only if:</p>

<ul>
  <li>all required directories exist,</li>
  <li>required files are present,</li>
  <li>decisions are logged,</li>
  <li>constraints are enforceable,</li>
  <li>observability exists before deployment.</li>
</ul>

<hr />

<h2 id="a16-summary">A.16 Summary</h2>

<p>This file and folder standard ensures that:</p>

<ul>
  <li>humans control purpose and judgement,</li>
  <li>AI controls execution,</li>
  <li>automation enforces safety,</li>
  <li>evidence drives change,</li>
  <li>accountability is explicit.</li>
</ul>

<p>It is intentionally strict.</p>

<hr />

<h1 id="appendix-b-yaml-specification-schemas">Appendix B: YAML Specification Schemas</h1>

<p>This appendix defines the normative YAML schemas used by AI-SDLC v1.0.
These schemas are machine-checkable and enforceable.</p>

<hr />

<h2 id="b1-specsystemyaml">B.1 spec/system.yaml</h2>

<p>Defines system identity, ownership, and scope.</p>

<pre><code class="language-yaml">system:
  name: string
  version: string
  description: string

ownership:
  intent_owner: string
  decision_owner: string
  system_steward: string

scope:
  in_scope:
    - string
  out_of_scope:
    - string
</code></pre>

<p>Rules:</p>

<ul>
  <li>system.name, ownership.intent_owner, and ownership.decision_owner are required.</li>
  <li>Version changes require a recorded decision in decisions/.</li>
</ul>

<hr />

<h2 id="b2-specinterfacesyaml">B.2 spec/interfaces.yaml</h2>

<p>Defines externally visible behaviour.</p>

<pre><code class="language-yaml">interfaces:
  - name: string
    type: api | event | job
    description: string

    endpoint:
      method: GET | POST | PUT | PATCH | DELETE
      path: string

    input:
      schema_ref: string

    output:
      schema_ref: string

    errors:
      - code: string
        description: string

    auth:
      type: none | api_key | oauth2 | jwt
</code></pre>

<p>Rules:</p>

<ul>
  <li>Every interface MUST define errors.</li>
  <li>Authentication MUST be explicit (none is allowed, but must be stated).</li>
</ul>

<hr />

<h2 id="b3-specdatayaml">B.3 spec/data.yaml</h2>

<p>Defines persistent and transient state.</p>

<pre><code class="language-yaml">data:
  stores:
    - name: string
      type: postgres | mysql | dynamodb | redis | filesystem
      purpose: string

      entities:
        - name: string
          fields:
            - name: string
              type: string
              nullable: boolean
              primary_key: boolean
</code></pre>

<p>Rules:</p>

<ul>
  <li>Every store MUST declare purpose.</li>
  <li>Data creation MUST be explicit (no implicit entities).</li>
</ul>

<hr />

<h2 id="b4-specnfryaml">B.4 spec/nfr.yaml</h2>

<p>Defines enforceable non-functional requirements.</p>

<pre><code class="language-yaml">nfr:
  availability:
    target_percentage: number

  performance:
    latency_p95_ms: number
    throughput_rps: number

  reliability:
    error_rate_percentage: number

  cost:
    monthly_budget_limit: number
    currency: string

  scalability:
    max_users: number
</code></pre>

<p>Rules:</p>

<ul>
  <li>At least one NFR section MUST be present.</li>
  <li>NFRs MUST be measurable at runtime.</li>
</ul>

<hr />

<h2 id="b5-specacceptanceyaml">B.5 spec/acceptance.yaml</h2>

<p>Defines correctness via scenarios.</p>

<pre><code class="language-yaml">acceptance:
  - scenario: string
    given:
      - string
    when:
      - string
    then:
      - string
</code></pre>

<p>Rules:</p>

<ul>
  <li>Each interface MUST have at least one acceptance scenario.</li>
  <li>Scenarios MUST be testable.</li>
</ul>

<hr />

<h2 id="b6-specobservabilityyaml">B.6 spec/observability.yaml</h2>

<p>Defines required runtime evidence.</p>

<pre><code class="language-yaml">observability:
  metrics:
    - name: string
      type: counter | gauge | histogram
      description: string

  logs:
    - name: string
      level: info | warn | error
      description: string

  alerts:
    - name: string
      condition: string
      severity: low | medium | high | critical
</code></pre>

<p>Rules:</p>

<ul>
  <li>Observability MUST exist before deployment.</li>
  <li>Alerts MUST map to NFR breaches or constraint violations.</li>
</ul>

<hr />

<h2 id="b7-specsecurityyaml-optional-but-recommended">B.7 spec/security.yaml (optional but recommended)</h2>

<p>Defines security posture.</p>

<pre><code class="language-yaml">security:
  data_classification: public | internal | confidential | restricted

  controls:
    - name: string
      enforced: boolean

  secrets:
    storage: vault | env | kms
</code></pre>

<p>Rules:</p>

<ul>
  <li>If data_classification is not public, at least one controls entry MUST be present.</li>
  <li>Secrets MUST NOT be stored in code or committed to the repo.</li>
</ul>

<hr />

<h2 id="b8-cross-schema-invariants">B.8 Cross-schema invariants</h2>

<p>Automation MUST enforce all of the following:</p>

<ul>
  <li>Every item in interfaces has at least one matching acceptance scenario in acceptance.</li>
  <li>Every NFR defined in nfr.yaml has at least one corresponding metric in observability.metrics.</li>
  <li>Every alert corresponds to a constraint breach (NFR or security control).</li>
  <li>No production deployment is allowed without observability.</li>
</ul>

<p>If any invariant fails, the pipeline MUST block deployment.</p>

<hr />

<h2 id="b9-authority-order-reminder">B.9 Authority order reminder</h2>

<p>When conflicts exist, resolve them in this order:</p>

<ol>
  <li>intent/</li>
  <li>spec/</li>
  <li>decisions/</li>
  <li>compliance/</li>
  <li>implementation (src/)</li>
</ol>

<p>Code MUST NOT override intent or decisions.</p>]]></content><author><name>Khalid Taha</name></author><category term="Other" /><summary type="html"><![CDATA[Full Specification (Single Document)]]></summary></entry><entry><title type="html">AI-SDLC v1.0</title><link href="https://khalid-taha.github.io/2025/12/25/AI-SDLC-v1.html" rel="alternate" type="text/html" title="AI-SDLC v1.0" /><published>2025-12-25T00:00:00+00:00</published><updated>2025-12-25T00:00:00+00:00</updated><id>https://khalid-taha.github.io/2025/12/25/AI-SDLC-v1</id><content type="html" xml:base="https://khalid-taha.github.io/2025/12/25/AI-SDLC-v1.html"><![CDATA[<p><em>A short position paper</em></p>

<h2 id="purpose">Purpose</h2>

<p>This paper proposes <strong>AI-SDLC v1.0</strong>, a software development lifecycle designed for a world where <strong>AI performs most implementation work</strong> and <strong>humans own intent, constraints, and decisions</strong>.</p>

<p>It does not promote a tool, framework, or methodology brand.<br />
It describes a structural shift in how software is built.</p>

<hr />

<h2 id="the-historical-bottleneck">The historical bottleneck</h2>

<p>Traditional SDLCs evolved to address a single dominant constraint: <strong>human execution</strong>.</p>

<p>Historically:</p>
<ul>
  <li>writing code was slow and expensive,</li>
  <li>Refactoring carries a high risk,</li>
  <li>coordination between people was costly,</li>
  <li>Late errors were hard to correct.</li>
</ul>

<p>Processes such as Waterfall and Agile are optimised around these realities.<br />
Their practices exist to manage execution cost and coordination risk.</p>

<hr />

<h2 id="what-ai-changed">What AI changed</h2>

<p>AI collapses the implementation cost.</p>

<p>Today, systems can:</p>
<ul>
  <li>generate and rewrite code quickly,</li>
  <li>refactor large codebases cheaply,</li>
  <li>produce tests and infrastructure automatically.</li>
</ul>

<p>Execution is no longer the primary bottleneck in many environments.</p>

<hr />

<h2 id="the-new-bottleneck">The new bottleneck</h2>

<p>When execution becomes cheap, a different constraint dominates:</p>

<ul>
  <li>unclear intent,</li>
  <li>weak constraints,</li>
  <li>untested assumptions,</li>
  <li>delayed or misleading feedback,</li>
  <li>Poor decisions made without evidence.</li>
</ul>

<p>Speed amplifies outcomes, both good and bad.<br />
Building faster does not help if the direction is wrong.</p>

<p>The SDLC must therefore optimise for <strong>decision quality</strong>, not task throughput.</p>

<hr />

<h2 id="design-goals-of-an-ai-native-sdlc">Design goals of an AI-native SDLC</h2>

<p>An AI-native SDLC should:</p>

<ul>
  <li>make intent explicit before execution,</li>
  <li>treat constraints as enforceable system rules,</li>
  <li>allow implementation to be replaced easily,</li>
  <li>enforce quality and safety automatically,</li>
  <li>rely on observed runtime behaviour,</li>
  <li>preserve clear human accountability.</li>
</ul>

<p>These goals define AI-SDLC v1.0.</p>

<hr />

<h2 id="ai-sdlc-v10-lifecycle">AI-SDLC v1.0 lifecycle</h2>

<p>AI-SDLC v1.0 is a <strong>closed decision loop</strong>, not a linear delivery pipeline.</p>

<h3 id="1-intent">1. Intent</h3>
<p>A human defines:</p>
<ul>
  <li>the problem,</li>
  <li>the desired outcome,</li>
  <li>non-negotiable constraints,</li>
  <li>key assumptions,</li>
  <li>stop or change conditions.</li>
</ul>

<h3 id="2-specification">2. Specification</h3>
<p>The intent is expressed in a machine-readable form describing:</p>
<ul>
  <li>behaviour and interfaces,</li>
  <li>data and state,</li>
  <li>non-functional requirements,</li>
  <li>acceptance scenarios,</li>
  <li>required observability.</li>
</ul>

<h3 id="3-automated-implementation">3. Automated implementation</h3>
<p>AI generates:</p>
<ul>
  <li>code,</li>
  <li>tests,</li>
  <li>infrastructure,</li>
  <li>instrumentation,</li>
  <li>supporting artefacts.</li>
</ul>

<p>Humans review outcomes rather than hand-crafting implementation.</p>

<h3 id="4-automated-enforcement">4. Automated enforcement</h3>
<p>Quality, security, performance, and cost limits are enforced automatically.<br />
Failures block progression.</p>

<h3 id="5-deployment">5. Deployment</h3>
<p>The system is deployed incrementally and safely.</p>

<h3 id="6-observation">6. Observation</h3>
<p>The running system produces evidence through:</p>
<ul>
  <li>user and system behaviour,</li>
  <li>reliability and performance data,</li>
  <li>operational cost.</li>
</ul>

<h3 id="7-decision">7. Decision</h3>
<p>A named decision owner chooses to:</p>
<ul>
  <li>continue,</li>
  <li>adjust,</li>
  <li>stop,</li>
  <li>or pivot.</li>
</ul>

<p>The decision closes the loop.</p>

<hr />

<h2 id="unit-of-progress">Unit of progress</h2>

<p>In AI-SDLC v1.0, progress is measured by <strong>decisions informed by evidence</strong>, not by the number of tasks completed.</p>

<p>Progress means:</p>
<ul>
  <li>uncertainty was reduced,</li>
  <li>assumptions were tested,</li>
  <li>direction became clearer.</li>
</ul>

<hr />

<h2 id="accountability">Accountability</h2>

<p>AI-SDLC v1.0 does not remove human responsibility.</p>

<p>Humans remain accountable for:</p>
<ul>
  <li>intent,</li>
  <li>constraints,</li>
  <li>decisions,</li>
  <li>outcomes.</li>
</ul>

<p>AI executes. Humans decide.</p>

<hr />

<h2 id="conclusion">Conclusion</h2>

<p>AI changes the economics of software development.<br />
When execution is cheap, decision quality becomes the dominant factor.</p>

<p>AI-SDLC v1.0 aligns the SDLC with this reality by treating intent, constraints, evidence, and accountability as first-class elements, and by positioning AI as the primary executor rather than the primary decision-maker.</p>]]></content><author><name>Khalid Taha</name></author><category term="Other" /><summary type="html"><![CDATA[A short position paper]]></summary></entry><entry><title type="html">Rethinking the Software Development Lifecycle for an AI Builder</title><link href="https://khalid-taha.github.io/2025/12/24/AI-SDLC.html" rel="alternate" type="text/html" title="Rethinking the Software Development Lifecycle for an AI Builder" /><published>2025-12-24T00:00:00+00:00</published><updated>2025-12-24T00:00:00+00:00</updated><id>https://khalid-taha.github.io/2025/12/24/AI-SDLC</id><content type="html" xml:base="https://khalid-taha.github.io/2025/12/24/AI-SDLC.html"><![CDATA[<p>Abstract</p>

<p>The software development lifecycle (SDLC) has historically been designed around the constraints of human execution. Planning frameworks, coordination mechanisms, and delivery processes emerged to manage slow, manual implementation and high integration risk. Recent advances in artificial intelligence fundamentally change these constraints. When AI systems can generate, modify, and discard code at low cost, execution ceases to be the primary bottleneck. This paper argues that the SDLC must be redesigned accordingly. It proposes AI-SDLC v1.0, a decision-driven lifecycle in which humans define intent and constraints, AI performs implementation, automated systems enforce quality and safety, and observed runtime behaviour determines direction.</p>

<p>⸻</p>

<ol>
  <li>Introduction</li>
</ol>

<p>For decades, software delivery has been limited by the cost and risk of implementation. Writing code, integrating changes, testing, and deploying reliably required significant human effort and coordination. The SDLC evolved to manage these realities.</p>

<p>Artificial intelligence changes this equation. Large language models and AI-assisted tooling can now generate working systems, refactor existing codebases, and produce tests and infrastructure at speeds that were previously impractical. As a result, implementation is no longer the dominant constraint in many software projects.</p>

<p>This shift requires a corresponding change in how software development is organised and governed.</p>

<p>⸻</p>

<ol>
  <li>The historical SDLC constraint</li>
</ol>

<p>Traditional SDLC models assumed:
	•	implementation was expensive and slow,
	•	errors were costly to correct late,
	•	coordination between people was a major risk,
	•	parallel work increased integration complexity.</p>

<p>Waterfall, Agile, and their variants are optimisations around these assumptions. Their practices—backlogs, sprint cycles, hand-offs, and ceremonies—exist primarily to manage human execution and coordination.</p>

<p>When these assumptions no longer hold, the effectiveness of the process degrades.</p>

<p>⸻</p>

<ol>
  <li>The impact of AI on execution cost</li>
</ol>

<p>AI systems significantly reduce the marginal cost of:
	•	writing and rewriting code,
	•	generating tests and scaffolding,
	•	refactoring across large codebases,
	•	producing infrastructure and configuration artefacts.</p>

<p>As execution cost falls, new risks dominate:
	•	unclear intent,
	•	poorly defined constraints,
	•	untested assumptions,
	•	delayed or misleading feedback,
	•	decisions made without evidence.</p>

<p>In this environment, delivering software faster does not guarantee better outcomes. Speed amplifies both correct and incorrect decisions.</p>

<p>⸻</p>

<ol>
  <li>The new primary constraint: decision quality</li>
</ol>

<p>In an AI-enabled environment, the dominant constraint shifts upstream:
	•	What problem should be solved?
	•	What outcome matters?
	•	What constraints must not be violated?
	•	What assumptions are being made?
	•	What evidence will justify continuing, changing, or stopping?</p>

<p>The SDLC must therefore optimise for decision quality, not task throughput.</p>

<p>⸻</p>

<ol>
  <li>Design goals for an AI-native SDLC</li>
</ol>

<p>An SDLC designed for an AI builder should:
	1.	Make intent explicit before execution.
	2.	Treat constraints as enforceable system properties.
	3.	Allow implementation to be replaced without ceremony.
	4.	Enforce quality, security, and safety automatically.
	5.	Base decisions on observed system behaviour.
	6.	Maintain clear human accountability.</p>

<p>These goals inform AI-SDLC v1.0.</p>

<p>⸻</p>

<ol>
  <li>AI-SDLC v1.0</li>
</ol>

<p>AI-SDLC v1.0 is a decision-driven lifecycle composed of a closed loop rather than a linear delivery pipeline.</p>

<p>6.1 Intent definition</p>

<p>A human defines:
	•	the problem being addressed,
	•	the affected users or systems,
	•	the desired outcome,
	•	non-negotiable constraints,
	•	key assumptions,
	•	conditions that would justify stopping or changing direction.</p>

<p>This intent is concise and explicit. No implementation begins without it.</p>

<p>⸻</p>

<p>6.2 Formal specification</p>

<p>The intent is translated into a machine-readable specification describing:
	•	system behaviour and interfaces,
	•	data and state,
	•	non-functional requirements,
	•	acceptance scenarios,
	•	required observability.</p>

<p>The specification captures assumptions and boundaries rather than promising a fixed solution.</p>

<p>⸻</p>

<p>6.3 Automated implementation</p>

<p>AI systems generate:
	•	application code,
	•	tests,
	•	infrastructure configuration,
	•	instrumentation,
	•	supporting documentation.</p>

<p>Human involvement focuses on review and judgement, not manual construction.</p>

<p>⸻</p>

<p>6.4 Automated enforcement</p>

<p>Automated gates enforce:
	•	correctness through tests,
	•	security and dependency controls,
	•	performance and cost limits,
	•	deployment safety mechanisms.</p>

<p>Failures block progression without negotiation.</p>

<p>⸻</p>

<p>6.5 Deployment</p>

<p>Deployment is incremental and controlled. It is treated as the start of observation rather than the end of development.</p>

<p>⸻</p>

<p>6.6 Observation</p>

<p>The running system produces evidence, including:
	•	user and system behaviour,
	•	reliability and performance data,
	•	operational cost,
	•	failure modes and friction points.</p>

<p>This evidence must be defined prior to implementation.</p>

<p>⸻</p>

<p>6.7 Decision</p>

<p>A named decision owner reviews the evidence and chooses one action:
	•	continue,
	•	adjust,
	•	stop,
	•	pivot.</p>

<p>The decision and its basis are recorded, closing the loop.</p>

<p>⸻</p>

<ol>
  <li>Unit of progress</li>
</ol>

<p>In AI-SDLC v1.0, the primary unit of progress is a decision informed by evidence, not a completed task.</p>

<p>Progress is measured by:
	•	reduced uncertainty,
	•	validated or invalidated assumptions,
	•	clearer direction.</p>

<p>⸻</p>

<ol>
  <li>Accountability and roles</li>
</ol>

<p>AI-SDLC v1.0 does not eliminate human responsibility. It clarifies it.</p>

<p>Required functions include:
	•	intent ownership,
	•	decision accountability,
	•	system stewardship,
	•	automated execution.</p>

<p>These functions may be combined or separated as appropriate.</p>

<p>⸻</p>

<ol>
  <li>Implications</li>
</ol>

<p>AI-SDLC v1.0 does not reject prior methodologies. It recognises that they were optimised for a different constraint set. As implementation cost collapses, SDLCs must prioritise clarity, evidence, and decision-making.</p>

<p>Organisations that continue optimising for execution speed alone risk becoming faster at producing the wrong outcomes.</p>

<p>⸻</p>

<ol>
  <li>Conclusion</li>
</ol>

<p>AI fundamentally alters the economics of software development. When execution is cheap, decision quality becomes the dominant factor. AI-SDLC v1.0 proposes a lifecycle aligned with this reality: one that treats intent, constraints, evidence, and accountability as first-class elements, and positions AI as the primary executor rather than the primary decision-maker.</p>

<p>⸻</p>]]></content><author><name>Khalid Taha</name></author><category term="Other" /><summary type="html"><![CDATA[Abstract]]></summary></entry><entry><title type="html">Moving Past the Data: Challenges in Adopting AI in Industry</title><link href="https://khalid-taha.github.io/2025/08/07/Moving-Past-the-Data.html" rel="alternate" type="text/html" title="Moving Past the Data: Challenges in Adopting AI in Industry" /><published>2025-08-07T00:00:00+00:00</published><updated>2025-08-07T00:00:00+00:00</updated><id>https://khalid-taha.github.io/2025/08/07/Moving-Past-the-Data</id><content type="html" xml:base="https://khalid-taha.github.io/2025/08/07/Moving-Past-the-Data.html"><![CDATA[<p><strong>Author:</strong> Khalid Taha<br />
<strong>Date:</strong> 7 August 2025</p>

<h2 id="abstract">Abstract</h2>

<p>High-quality data is crucial for the successful adoption of AI, yet many initiatives falter due to various other factors. This paper highlights four key structural challenges that obstruct AI implementation: complexity of integration, organizational inertia, ambiguity regarding return on investment, and an excessive focus on data perfection. Insights from industry benchmarks and recent studies suggest that achieving AI readiness requires a holistic approach that goes beyond merely having clean data; it demands systemic alignment throughout the organization.</p>

<h2 id="1-introduction">1. Introduction</h2>

<p>Some industry experts link the failures of AI projects to issues with data quality. Yet, few studies systematically assess this belief, and many documented failures stem from broader challenges that aren’t solely related to data inputs. In their 2023 study, Zha et al. introduce a framework for data-centric AI that evaluates readiness in three key areas: the development of training data, the preparation of inference data, and ongoing maintenance. They suggest that placing too much emphasis on model architecture has shifted focus away from critical data and systems engineering challenges. Their findings advocate for a more comprehensive view that emphasizes the need for alignment between technical and organizational systems, rather than just improved datasets.</p>

<h2 id="2-integration-complexity">2. Integration Complexity</h2>

<p>AI systems often need to work alongside legacy software, batch processes, and disconnected workflows. Many operational systems struggle to handle real-time outputs or incorporate predictive decisions effectively. This challenge is illustrated by Mazumder et al. (2024) in their study of the Gen-QOT inventory control framework. They model realistic constraints such as specific lead times from suppliers, rules for batch shipments, and patterns of disruption. Even when using clean, simulated data, the outputs of these models can fall flat unless they are seamlessly integrated into existing business workflows. The main hurdle isn’t the quality of the data; instead, it’s the difficulty of applying predictions to decision-making in legacy systems. Numerous projects come to a standstill because downstream systems cannot utilize model outputs.</p>

<h2 id="3-organizational-inertia">3. Organizational Inertia</h2>

<p>The success of AI hinges on the people involved and the processes in place, not merely on the algorithms themselves. Often, internal resistance, insufficient training, and ambiguous accountability can hinder deployment efforts.</p>

<p>The DataPerf suite (Mazumder et al., 2023) evaluates data-focused interventions in areas like vision, speech, and retail. Their research indicates that human-driven initiatives, such as targeted data cleaning and effective sample selection, led to performance improvements that surpassed those achieved through model tuning. Organizations that engage in ongoing collaboration with their data teams generally outpace those that strive for a flawless dataset from the start.</p>

<p>Ultimately, organizational readiness—characterized by teamwork, iterative development, and clear ownership—proves to be more influential than infrastructure alone.</p>

<h2 id="4-economic-friction-and-roi-ambiguity">4. Economic Friction and ROI Ambiguity</h2>

<p>Many businesses are often reluctant to adopt AI due to uncertainties around its short-term value. Complex solutions frequently struggle to provide quick or apparent returns. In a study by Chandra et al. (2024), various forecasting models were assessed using retail data. Surprisingly, simpler machine learning methods, such as LightGBM and XGBoost, which are based on decision trees, outperformed deep neural networks. These models not only trained faster and offered more precise explanations for their outcomes but also required considerably fewer resources. Their success stemmed not from sophisticated data or computing power, but from their practical fit with real-world deployment challenges. By selecting models that align closely with business needs rather than focusing solely on technical innovation, companies can effectively lower costs and mitigate risks.</p>

<h2 id="5-tolerance-of-ai-to-imperfect-data">5. Tolerance of AI to Imperfect Data</h2>

<p>AI has shown remarkable resilience, performing effectively even in situations where data is noisy, incomplete, or subjected to censorship. A notable development in this area is the FreshRetailNet-50K benchmark introduced by Singh et al. (2024) for demand forecasting during stockouts. Their research explored various time-series models like TimesNet and ImputeFormer, which leverage attention mechanisms to fill in the gaps left by missing demand signals. Despite a significant 30% data loss, these models managed to maintain low bias and high accuracy, challenging the traditional belief that missing data renders forecasting unreliable. Thanks to these advancements, modern models are now equipped to handle the messiness often found in real-world scenarios. As a result, businesses can begin their data-driven initiatives with imperfect information rather than holding out for complete data quality.</p>

<h2 id="6-conclusion">6. Conclusion</h2>

<p>The successful deployment of AI involves more than just having clean data. It requires the seamless integration of existing systems, alignment with business priorities, and strong collaboration among teams. Benchmarks like DataPerf and FreshRetailNet-50K highlight that readiness stems from organizational adaptability, not merely the quality of input data. Enhancing AI maturity entails investing in system design, refining organizational processes, and embracing iterative practices—not solely focusing on datasets.</p>

<h2 id="references">References</h2>

<p>Zha, D., Bhat, Z. P., Lai, K. H., Yang, F., Jiang, Z., Zhong, S., &amp; Hu, X. (2023). Data-centric Artificial Intelligence: A Survey. <em>arXiv preprint arXiv:2303.10158</em>.</p>

<p>Mazumder, M., et al. (2023). DataPerf: Benchmarks for Data-Centric AI Development. <em>arXiv preprint arXiv:2207.10062</em>.</p>

<p>Mazumder, M., et al. (2024). Learning an Inventory Control Policy with General Inventory Arrival Dynamics. <em>arXiv preprint arXiv:2405.17533</em>.</p>

<p>Chandra, R., Ruj, S., &amp; Pal, A. (2024). Comparative Analysis of Modern Machine Learning Models for Retail Sales Forecasting. <em>arXiv preprint arXiv:2506.05941</em>.</p>

<p>Singh, K. K., et al. (2024). FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail. <em>arXiv preprint arXiv:2405.10468</em>.</p>]]></content><author><name>Khalid Taha</name></author><category term="Other" /><summary type="html"><![CDATA[Author: Khalid Taha Date: 7 August 2025]]></summary></entry><entry><title type="html">Making a Stateless LLM Project‑Aware</title><link href="https://khalid-taha.github.io/2025/07/26/Making-a-Stateless-LLM-Project-Aware.html" rel="alternate" type="text/html" title="Making a Stateless LLM Project‑Aware" /><published>2025-07-26T00:00:00+00:00</published><updated>2025-07-26T00:00:00+00:00</updated><id>https://khalid-taha.github.io/2025/07/26/Making-a-Stateless-LLM-Project%E2%80%91Aware</id><content type="html" xml:base="https://khalid-taha.github.io/2025/07/26/Making-a-Stateless-LLM-Project-Aware.html"><![CDATA[<p>Large language models have goldfish memories—they don’t recall past calls unless you hand them that context every single time. Yet you <em>can</em> run a weeks‑long coding project with ChatGPT (or any other LLM) if you wrap a thin layer of “state management” around each request.<br />
Below you’ll find the key ideas in plain, practical language. They work no matter how you connect: web chat, REST API, CLI, whatever.</p>

<hr />

<h3 id="1persist-state-outside-the-conversation">1  Persist state outside the conversation</h3>

<p>Treat the model like a freelance coder who clears their desk at the end of the day. Anything you don’t file away will vanish.<br />
Keep three small documents in your own storage—Git, S3, a database row, anything you control:</p>

<table>
  <thead>
    <tr>
      <th>File</th>
      <th>What’s inside</th>
      <th>Why it matters</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Blueprint / spec</strong></td>
      <td>The lasting requirements and architecture notes.</td>
      <td>Gives the model a north‑star every call.</td>
    </tr>
    <tr>
      <td><strong>Journal / log</strong></td>
      <td>A running human‑readable diary of what changed and why.</td>
      <td>Lets you audit progress and decisions.</td>
    </tr>
    <tr>
      <td><strong>State file</strong></td>
      <td>Exactly one entry like <code>next_task: “build parser”</code> plus a <code>status</code>.</td>
      <td>Tells the model what to do <em>right now</em>.</td>
    </tr>
  </tbody>
</table>

<p>The assistant only reads and edits these files; it never invents its own version of reality.</p>

<hr />

<h3 id="2embed-a-deterministic-contract-in-every-prompt">2  Embed a deterministic contract in every prompt</h3>

<p>Before you hit “send,” your wrapper code builds a prompt that repeats the ground rules:</p>

<ol>
  <li>Load the state file.</li>
  <li>If there’s one task marked <strong>todo</strong>, finish it in this turn.</li>
  <li>If nothing is todo, pull the next milestone from the spec and write it into the state file.</li>
  <li>Make sure there’s <strong>never more than one open task</strong>.</li>
</ol>

<p>Because you restate the contract every time, the model can’t drift—even if the chat history is empty.</p>

<hr />

<h3 id="3return-full-files-not-diffs">3  Return full files, not diffs</h3>

<p>Ask for the <em>entire</em> contents of each file it touched. Why? You can overwrite the old file without worrying about merge conflicts, and your test runner can prove everything still works. The transport doesn’t matter—fenced code blocks, JSON parts, multipart: just send the whole thing.</p>

<hr />

<h3 id="4keep-each-round-bitesized-and-atomic">4  Keep each round bite‑sized and atomic</h3>

<ul>
  <li>One todo → one model call → one test run.</li>
  <li>Tests happen <strong>after</strong> files are written.</li>
  <li>If tests fail, you feed the error log back; the same todo stays put.</li>
  <li>If tests pass, mark the task <strong>done</strong> and queue the next one.</li>
</ul>

<p>Small, clear steps mean bugs are easy to trace and roll back.</p>

<hr />

<h3 id="5park-heavy-data-elsewhere">5  Park heavy data elsewhere</h3>

<p>Chat messages should stay text‑only. Big binaries (images, model weights, year‑long CSVs) live in cloud storage; the state file just holds a link or hash. That keeps prompts fast and avoids attachment headaches.</p>

<hr />

<h3 id="6automate-the-envelope-let-the-model-do-the-thinking">6  Automate the envelope, let the model do the thinking</h3>

<p>A lightweight driver script can:</p>

<ol>
  <li>Read the three docs from storage.</li>
  <li>Build the prompt with the contract above.</li>
  <li>Call the LLM.</li>
  <li>Parse the reply, save the files.</li>
  <li>Run your test suite; if green, commit and push; if red, send the errors back as the next prompt.</li>
</ol>

<p>The script handles routine plumbing— the model focuses on writing code and updating the state doc.</p>

<hr />

<p><strong>Take‑away</strong><br />
Long‑running work with an LLM isn’t magic; it’s a simple protocol. Persist a tiny control plane (spec, journal, state) and remind the model of the rules at every turn. Follow that rhythm and your AI teammate will pick up exactly where it left off—no matter how or when you call it.</p>]]></content><author><name>Khalid Taha</name></author><category term="Other" /><summary type="html"><![CDATA[Large language models have goldfish memories—they don’t recall past calls unless you hand them that context every single time. Yet you can run a weeks‑long coding project with ChatGPT (or any other LLM) if you wrap a thin layer of “state management” around each request. Below you’ll find the key ideas in plain, practical language. They work no matter how you connect: web chat, REST API, CLI, whatever.]]></summary></entry><entry><title type="html">From Code to Conversation: Embracing High-Level Prompt Languages</title><link href="https://khalid-taha.github.io/2025/02/07/From_Code_to_Conversation.html" rel="alternate" type="text/html" title="From Code to Conversation: Embracing High-Level Prompt Languages" /><published>2025-02-07T00:00:00+00:00</published><updated>2025-02-07T00:00:00+00:00</updated><id>https://khalid-taha.github.io/2025/02/07/From_Code_to_Conversation</id><content type="html" xml:base="https://khalid-taha.github.io/2025/02/07/From_Code_to_Conversation.html"><![CDATA[<p>Programming is evolving. As AI continues to advance, the role of developers is shifting from writing every line of code to specifying what software should do at a much higher level of abstraction. This evolution suggests a future where developers may work primarily with a standardized prompt language above traditional programming languages like Python.</p>

<hr />

<p><strong>A New Layer of Abstraction</strong></p>

<p>For decades, the programming landscape has moved upward along the abstraction ladder. Initially, programmers had to manage low-level machine instructions. The advent of higher-level languages allowed us to focus more on logic and less on hardware details. With AI-powered code generation, there’s potential to push this even further. Rather than manually coding every detail, we can describe user stories, requirements, and edge cases in natural or formal prompt language.</p>

<p>Imagine a scenario where you write a detailed prompt that encapsulates what the software should do. This prompt—validated by a syntax checker and processed by a compiler or interpreter designed for the prompt language—would then be translated by AI into working code. This approach could streamline development, reduce ambiguity, and let developers concentrate on the strategic aspects of system design.</p>

<hr />

<p><strong>The Advantages of a Prompt Language</strong></p>

<ol>
  <li>
    <p><strong>Higher-Level Thinking:</strong>
Prompt language allows us to operate at a higher level than traditional code. Instead of focusing on minute details, developers would define the software’s goal, letting AI handle the translation into executable code.</p>
  </li>
  <li>
    <p><strong>Clarity and Precision:</strong>
With a standardized syntax and semantics, a formal prompt language could reduce the ambiguities inherent in natural language. Early error detection through syntax checking would improve the development process and reduce the need for multiple follow-up prompts.</p>
  </li>
  <li>
    <p><strong>Enhanced Collaboration:</strong>
As AI handles more low-level implementation, developers can focus on system architecture, user experience, and overall strategy. This shift would foster a more collaborative environment where human insight and machine precision work hand in hand.</p>
  </li>
</ol>

<hr />

<p><strong>Challenges on the Path Forward</strong></p>

<p>Despite its promise, adopting a prompt language is not without challenges:</p>

<ul>
  <li>
    <p><strong>Ambiguity in Natural Language:</strong>
While natural language is flexible and accessible, its inherent ambiguity can lead to misinterpretations. A formalized prompt language must strike a balance between accessibility and precision.</p>
  </li>
  <li>
    <p><strong>Learning Curve:</strong>
Developers may need to acquire new skills to craft high-level prompts effectively. Transitioning from writing detailed code to articulating comprehensive requirements will require adjustments in mindset and practice.</p>
  </li>
  <li>
    <p><strong>Human Oversight Remains Crucial:</strong>
Even as AI-generated code becomes more reliable, human expertise is indispensable. Developers will still need to validate the code, manage edge cases, and ensure the final product is robust and secure. AI acts as a powerful tool, but the developer is ultimately responsible for quality and integration.</p>
  </li>
</ul>

<hr />

<p><strong>A Collaborative Future</strong></p>

<p>As AI matures, we can expect a future where the line between coding and conversation blurs. Developers might spend more time refining their high-level specifications and less time wrestling with syntax errors or debugging low-level code. This shift could democratize software development, making it accessible to a broader range of people and fostering more incredible innovation.</p>

<p>In this emerging paradigm, the relationship between humans and machines involves collaboration. Developers provide the vision and context, while AI handles the detailed implementation. This partnership increases productivity and opens up new avenues for creativity as more focus is on conceptual design and system architecture.</p>

<hr />

<p><strong>Conclusion</strong></p>

<p>The evolution toward a standardized prompt language represents a profound shift in how we approach programming. By moving to higher levels of abstraction, we can simplify the development process and leverage AI to do the heavy lifting. While challenges remain—especially in balancing precision with accessibility—this new model promises a future where programming is less about writing code line by line and more about designing robust, innovative systems through precise, high-level specifications.</p>

<p>Embracing this shift will require developers to rethink their roles, transitioning from code writers to architects of ideas. As we refine our methods and tools, the promise of a prompt-driven future could fundamentally transform the software development landscape.</p>]]></content><author><name>Khalid Taha</name></author><category term="Other" /><summary type="html"><![CDATA[Programming is evolving. As AI continues to advance, the role of developers is shifting from writing every line of code to specifying what software should do at a much higher level of abstraction. This evolution suggests a future where developers may work primarily with a standardized prompt language above traditional programming languages like Python.]]></summary></entry><entry><title type="html">Orchestrating LLMs: Building a Python Bookkeeping System with AI Collaboration</title><link href="https://khalid-taha.github.io/2024/11/07/Orchestrating-LLMs.html" rel="alternate" type="text/html" title="Orchestrating LLMs: Building a Python Bookkeeping System with AI Collaboration" /><published>2024-11-07T00:00:00+00:00</published><updated>2024-11-07T00:00:00+00:00</updated><id>https://khalid-taha.github.io/2024/11/07/Orchestrating-LLMs</id><content type="html" xml:base="https://khalid-taha.github.io/2024/11/07/Orchestrating-LLMs.html"><![CDATA[<p>In my recent project, I embarked on an exciting journey to develop a <strong>bookkeeping system using Python</strong>, heavily leveraging the capabilities of a <strong>large language model (LLM)</strong>. The goal was to build an application and explore how LLMs can assume various roles in software development—acting as <strong>business analysts, system designers, and code developers</strong>.</p>

<p>The process began by engaging the LLM to create <strong>user stories and tasks</strong> and identify their dependencies. The LLM provided detailed tables outlining these dependencies, which I manually inputted into GitHub’s project board. Although the automation wasn’t complete, the collaboration was seamless. I structured the project board with a simple workflow:</p>

<ul>
  <li><strong>User Story</strong></li>
  <li><strong>Backlog</strong></li>
  <li><strong>To-Do</strong></li>
  <li><strong>In Progress</strong></li>
  <li><strong>Testing</strong></li>
  <li><strong>Done</strong></li>
</ul>

<p>This was supplemented with <strong>labels and milestones</strong>—all guided by the LLM’s output.</p>

<p>For each user story, I asked the LLM to generate tasks, and I requested <strong>developer instructions</strong> for each task. Then, it produced <strong>code snippets</strong> by prompting the LLM to act as a developer. I was an <strong>AI Orchestrator</strong>, coordinating this process—copying code into <strong>VSCode</strong>, testing it using <strong>scripts generated by the LLM</strong>, and troubleshooting errors with its assistance.</p>

<p>The system features <strong>APIs and dummy machine-learning plugins</strong>, hinting at future integrations. The LLM also helped generate comprehensive <strong>API and plugin user guides</strong>, covering the application’s <strong>design, implementation, and testing phases</strong>.</p>

<p>Throughout this experiment, I tested several <strong>LLM models</strong> from different providers. While some fell short, one stood out in delivering exceptional results. Although I prefer not to advertise it openly, I’m open to sharing details collaboratively. This project underscores the <strong>potential of LLMs in software development</strong> and the evolving role of developers in orchestrating AI capabilities.</p>

<p>I’m excited about where this synergy between <strong>human coordination and AI</strong> can lead, especially with plans to enhance the <strong>ML plugins</strong>. This approach doesn’t replace developers but <strong>augments our ability to build complex systems efficiently</strong>.</p>

<p>You can explore the project’s code on my <a href="https://github.com/khalid-taha/bookkeeping"><strong>GitHub repository</strong></a>, linked to my <a href="https://www.linkedin.com/in/khalid-a-taha"><strong>LinkedIn profile</strong></a>. I’m sharing this experience through my startup, <a href="https://robotface.co.uk"><strong>RobotFace AI</strong></a>, as a testament to the innovative possibilities when humans and AI collaborate.</p>]]></content><author><name>Khalid Taha</name></author><category term="Other" /><summary type="html"><![CDATA[In my recent project, I embarked on an exciting journey to develop a bookkeeping system using Python, heavily leveraging the capabilities of a large language model (LLM). The goal was to build an application and explore how LLMs can assume various roles in software development—acting as business analysts, system designers, and code developers.]]></summary></entry></feed>