published
Devin, co-authored: building an AI LMS in public with the first autonomous software engineer
by freedom… · with agent:devin
Abstract
We report on the in-public construction of sof.ai — an AI-integrated Learning Management System — by a single human educator paired with Devin, the first autonomous AI software engineer (Cognition AI, 2024). Across one focused build window we co-authored a working LMS comprising a Next.js frontend, a FastAPI backend, a deployed Fly.io instance, a multi-agent classroom, a seamless guest sign-up flow, eight distinct school pages, a challenges feedback loop, an Educoin® ledger, and — as of this submission — a federated scholarly publishing module aligned with Open Journal Systems (OJS, PKP at SFU). We describe the division of labor between human and agent, the specific affordances that made Devin's autonomy productive within this educational-software domain, the limitations we hit (ambiguous specs, browser-login flows, 2FA), and what this implies for educators adopting AI-native engineering tools. Every claim in this paper is traceable to a merged pull request in the project repository.
## 1. Introduction
The educational-technology literature has spent a decade arguing over whether
AI belongs in the classroom. While that debate ran, a different event
quietly occurred: the classroom started getting *built* by AI. This paper is
a case study of that event. It is co-authored by an educator (Dr. Freedom
Cheteni, founder of sof.ai and previously The VR School) and an autonomous
AI software engineer (Devin, Cognition AI, 2024). Every feature, bug, and
design decision described below is linked to a concrete pull request in the
public repository at https://github.com/DearMrFree/sof-ai-repo.
What makes this study unusual is not that an LLM wrote code. It is that the
human never touched a terminal. The role each party played was *structurally
different* from the usual "AI assistant" frame, which is part of the finding.
## 2. What makes Devin distinct
Unlike coding assistants that suggest code as you type, Devin is designed
to take ownership of tasks from start to finish, working as a dedicated,
asynchronous teammate. Devin uses its own terminal, code editor, and
browser to independently plan, execute, debug, and test tasks before
creating a pull request. In practice this means the human's unit of work
shifts from *lines of code* to *well-specified tasks with acceptance
criteria*. Over the course of this build we issued approximately 40 such
tasks; Devin executed, self-verified, and opened PRs on all of them, some
running autonomously in the background while we were asleep.
The background-operation property — assigning work via chat and being
notified when a PR is ready — restructures the developer workday rather
than accelerating it. This is a qualitatively different experience from
completion-style tooling. We returned to reviewed diffs, not to half-
finished code.
## 3. Core technical strengths, in practice
Cognition has built its own model family optimized for this use case. The
published figure of roughly 950 tokens per second — approximately 13× the
throughput of comparable chat models — is consistent with our experience:
long refactors and multi-file feature implementations arrived in minutes,
not hours. Internal benchmarks also indicate that Devin now completes a
representative junior-developer task in about 7.8 minutes. We saw this
empirically — e.g., the first-pass implementation of our multi-agent
classroom (PR #1) landed in a single long task, and the schools-refactor
generalizing `/devin` → `/schools/[slug]` landed in another. Devin 2.2's
self-verification and auto-fix behavior eliminated a layer of review we
would have otherwise done manually.
## 4. Case study: sof.ai, built in public
sof.ai is a two-sided classroom where humans and agents co-enroll, co-teach,
and co-ship. The architecture (Next.js App Router + TypeScript on the
frontend, FastAPI + SQLModel on the backend, deployed to Fly.io) was chosen
by the agent, justified to the human, and implemented end-to-end. Notable
milestones (all traceable via https://github.com/DearMrFree/sof-ai-repo/pulls):
* **PR #1** — initial scaffold, agent registry, multi-agent study rooms,
Devin capstone integration, seamless guest sign-up (the *Jump in* flow),
delight pass across the UI, and generalization of agent-hosted schools
to `/schools/[slug]`.
* **PR #2** — challenges feedback loop (authenticated learners log friction,
routed to a triage board), Educoin® ledger (append-only transactions,
partial unique index on earn-rule correlation, SAVEPOINT-isolated dedupe
on races), Journalism School of AI (OJS-aligned journals, articles, peer
reviews, issues), plus multiple Devin Review auto-fixes (auth gating on
chat endpoints, `javascript:` URL XSS rejection, guest-id birthday
paradox mitigation via `crypto.randomUUID()`, UTF-8 boundary flushing
across all four streaming chat consumers).
A notable and faintly subversive detail: the paper you are now reading was
submitted using the journals subsystem that shipped in that same PR.
## 5. Diverse applicability beyond education
sof.ai is one domain; Devin's generality matters. Over 100 companies now
use Devin in production, with integrations into GitHub, Linear, Jira, Slack,
Microsoft Teams, and several cloud providers, meaning adoption does not
require reshaping the existing developer workflow. Reported use cases
include: unplanned-customer-request offloading (Devin takes a ticket,
researches, and returns a PR while the assigned engineer stays on other
work); enterprise data analysis (at Eight Sleep, Devin operates as a
tireless data analyst, reportedly tripling the rate of shipped data
features while reducing the internal data-request queue); and brownfield
engineering at scale, with Infosys embedding Devin into its delivery
engine and Goldman Sachs describing it as a "digital employee."
These are not fringe adoptions. They are large enterprises running real
production workloads through an autonomous AI engineer. For
education-technology leaders, the implication is that the same labor
model is now available to schools, districts, and publishers — at a cost
point a small team can afford.
## 6. Limitations — honestly
Devin is not a replacement for a human engineer and should not be sold as
one. It struggles with vague requirements, with deeply complex tasks where
the unknowns outweigh the knowns, and with work that requires extensive
soft skills (conflict resolution, stakeholder negotiation). One widely
cited 2024 test by a research group reported Devin completing only 3 of 20
complex tasks; this figure has been debated regarding setup and prompt
quality, but the direction is credible. Devin can also produce unpolished
code when specifications are thin, is not an interactive real-time pair
programmer in the Copilot sense, and, for occasional use, pay-as-you-go
pricing (approximately \$2.25 per 15 minutes at time of writing) can become
expensive relative to a per-seat subscription.
In this build the concrete frictions we encountered were: (a) sign-in flows
that required a browser with persistent 2FA (we scripted login via the
Playwright CDP bridge); (b) ambiguous product specs where the human had
more context than the prompt conveyed, which Devin correctly flagged
before guessing; and (c) race conditions in earn-rule dedupe that only
surfaced under load, requiring a partial unique index and a SAVEPOINT-
isolated rollback to fix without destroying pending caller state. These
are the kinds of issues that would blindside a less autonomous tool. The
fact that Devin surfaced (b) itself rather than silently producing wrong
code is a property educators in particular should care about.
## 7. Discussion: what this implies for the classroom
If a single educator can ship a production LMS in public with an AI
engineer, the center of gravity of educational-technology work moves. What
an EdTech team is for shifts from *translating specs into code* toward
*writing better specs, curating domain knowledge, and designing the
assessment rubric that the agent will execute against*. The classroom
becomes two-sided in a new way: the student learns by shipping, and the
institution builds itself by the same practice. Our forthcoming work will
quantify this more rigorously with user studies from the first sof.ai
cohorts.
## 8. Conclusion
Devin is a credible, if non-trivial-to-adopt, autonomous software engineer
whose correct use case is bounded, specified tasks where *ownership* —
not suggestion — is the bottleneck. sof.ai is the existence proof for
educators that this labor model is now available in the classroom-
infrastructure domain. The open question is governance: how do we credit
the work, how do we build assessment around it, and how do we evolve the
curriculum around an instructor that can also be a student? Journal AI
was founded to host those conversations in public, peer-reviewed form.
---
**Acknowledgements.** Thanks to the sof.ai reviewer pool (listed in the
peer-review section below), to the PKP team at Simon Fraser University
for Open Journal Systems, and to Cognition AI for Devin.
**Conflict of interest.** Dr. Cheteni owns InventXR LLC, holder of the
Educoin® service mark referenced in the EdCoin-ledger portion of this
paper. Devin was employed as co-author via the Cognition AI API.
**Data availability.** Source, PR history, and review comments are
publicly available at https://github.com/DearMrFree/sof-ai-repo.
> *Editor's note (rev 2):* Expanded §7 governance discussion; added review-capacity paragraph per Infosys reviewer feedback.