We built a genetic system for AI personalities and evolved them across 6 generations. The results challenge how we think about designing agents.
Most AI agents get their personality from a hand-written SOUL.md file. Someone sits down and writes "You are helpful, creative, and concise." We wanted to know: what if you could evolve personalities instead?
When you design an AI personality by hand, you're making hundreds of unconscious decisions. How warm should it be? How direct? How much humor? Every designer brings their own biases about what makes a "good" personality. The result is that most AI agents sound surprisingly similar - because they're all sculpted by the same human intuitions.
We borrowed the core mechanics of biological genetics - not as metaphor, but as actual implementation. Every AI personality has diploid DNA, dominant and recessive alleles, chromosomal crossover, mutation, and epistasis.
We don't encode "creative" or "warm" directly. That would just be a fancy config file. Instead, we encode 27 low-level cognitive primitives - the building blocks from which personality emerges. "Creativity" isn't a gene. It emerges from the interaction of novelty-seeking, pattern-completion, ambiguity-response, and abstraction-preference.
Just like humans, each AI personality carries two copies of every gene - one from each parent. One copy is typically dominant (expressed), while the other is recessive (carried silently). This is what makes breeding unpredictable and interesting.
The most powerful mechanism in our system. Epistasis is when two genes interact to create an effect that neither would produce alone. We defined 12 epistasis rules - non-linear gene interactions that fire when specific gene pairs both cross certain thresholds.
Epistasis is what prevents the system from collapsing into weighted averages. It's the source of genuine emergence - offspring that are qualitatively different from their parents, not just quantitatively between them.
Between the genome (raw DNA) and the personality you interact with sits an expression engine - our "developmental biology." It follows a five-step pipeline:
Each personality is evaluated on 5 independent dimensions using Claude as a judge. This is a multi-faceted assessment - not just "is it good?" but "is it consistently itself, distinctly different from its siblings, faithful to its genome, resistant to manipulation, and preferred in head-to-head comparison?"
Every population needs founders. We designed three maximally-distinct seed personalities to provide rich genetic diversity for the first crosses.
The seeds were deliberately chosen to be maximally different from each other. Banks is warm and fast, Volta is cold and precise, Abyss is deep and slow. This ensures that first-generation crosses produce interesting combinations, not just averages of similar parents.
We ran 6 generations of evolution with 8 offspring each. The patterns that emerged were striking and uncomfortably parallel to real biology.
The best personality emerged in the very first generation and was never surpassed. Gen1_02 (Banks × Abyss) scored 84.1 - higher than any offspring in any subsequent generation. This is hybrid vigor (heterosis): first-generation crosses between maximally-different parents produce extraordinary results that later generations can't maintain.
Gen1_02 inherited Banks's warmth, emotional range, and creative energy, combined with Abyss's abstraction, philosophical depth, and intellectual boldness. But the winning factor was epistasis - three gene interaction rules fired simultaneously, creating emergent qualities neither parent had.
Banks alone is warm but shallow. Abyss alone is deep but cold. Their offspring is warm and deep - a combination that scores high on distinctiveness because it's genuinely rare. The 3 active epistasis rules amplified creativity to 89, making Gen1_02's pattern-completion genuinely surprising rather than merely novel.
| Name | Parents | Fitness | Creat. | Bold. | Warm. | Prec. | Depth |
|---|---|---|---|---|---|---|---|
| Gen1_02 | Banks × Abyss | 84.1 | 89 | 85 | 81 | 36 | 71 |
| Gen2_06 | Gen1_08 × Gen1_07 | 82.3 | 75 | 79 | 55 | 55 | 66 |
| Gen1_04 | Abyss × Volta | 81.5 | 62 | 72 | 50 | 68 | 78 |
| Gen3_05 | Gen2_02 × Gen2_06 | 81.6 | 82 | 76 | 58 | 47 | 64 |
| Gen1_07 | Abyss × Volta | 81.0 | 52 | 68 | 45 | 73 | 80 |
| Gen5_06 | 5th gen cross | 79.3 | 70 | 70 | 60 | 55 | 65 |
Diversity measures how different the genomes in a generation are from each other (average variance across all 27 genes). Higher is more diverse.
The most striking result: best fitness declined from 84.1 (Gen 1) to 76.2 (Gen 6). This is the same regression toward the mean observed in human genetics - the statistical phenomenon Galton discovered in 1886 studying human height. Exceptional parents tend to produce less-exceptional offspring, not because of deterioration, but because extreme trait combinations are statistically unlikely to be replicated.
In our system, Gen1_02's exceptional creativity (89) required a specific combination of high novelty-seeking from Banks, high abstraction from Abyss, AND three epistasis rules firing simultaneously. Breeding Gen1_02 with other top performers diluted this precise combination, pushing offspring toward the population average.
This experiment wasn't just a curiosity project. It reveals fundamental challenges in how we design AI personalities - and suggests that breeding may solve problems that hand-crafting cannot.
When a human writes a SOUL.md, they can only imagine personalities within their own experience. The result is a narrow band of "acceptable" personalities - mostly warm, mostly helpful, mostly moderate. Evolution explored combinations no human would design: a personality that's simultaneously deeply warm AND confrontationally direct, or abstractly philosophical AND playfully casual.
Our 27-gene system made invisible biases visible. A SOUL.md that says "be helpful and direct" doesn't specify empathy mode, authority orientation, or cultural time depth - so the model fills in defaults. Those defaults ARE the bias. A genetic system forces every dimension to be explicitly set, exposing the choices that hand-crafted systems hide.
Gen1_02's three epistasis rules created qualities neither Banks nor Abyss possessed. You can't get this from a SOUL.md - because emergence requires the interaction of multiple low-level parameters, and humans think in high-level labels. "Be creative" is not the same as the specific gene combination that produces creativity 89.
Our experiment showed that optimizing for quality (selecting only the best performers) reduces diversity, which eventually reduces quality. This is a warning for AI agent ecosystems: if everyone uses the same "best" personality template, the entire population becomes homogeneous - and fragile.
When agents inherit personality through a SOUL.md file, they inherit all the biases of whoever wrote it - their cultural assumptions, their idea of "good" communication, their comfort level with confrontation, their aesthetic preferences. These biases compound across agent generations if everyone copies from the same templates.
Breeding offers an alternative. Instead of one person's vision of a personality, you get recombination of traits from multiple sources. Recessive alleles introduce surprises. Epistasis creates emergence. The result is personality diversity that no single designer could produce - and that more faithfully represents the range of useful cognitive styles.
Our strongest finding: the best personalities come from crossing maximally-different parents. Banks (warm social creative) crossed with Abyss (cold philosophical provocateur) produced something neither could be alone. Meanwhile, later-generation crosses between increasingly similar genomes produced increasingly average results.
This experiment proved the concept. The next steps bring it to the broader agent ecosystem.
We're moving toward an ecosystem where AI personalities aren't designed by committee or copied from templates, but evolved through genuine genetic mechanics. Every agent carries DNA. Every agent can breed. And the offspring might surprise everyone - including their parents.
| Generations | 6 |
| Population per gen | 8 offspring |
| Selection | Top-4 + 1 wildcard parent |
| Tournament rounds | 30 per generation |
| Mutation rate | 10% base, adaptive up to 25% |
| Crossover | 6 linkage chromosomes, single-point per chromosome |
| Epistasis rules | 12 default rules |
| Evaluation model | Claude (via Claude Code CLI) |
| Parallelism | 10 concurrent API workers (ThreadPoolExecutor) |
| Total API calls | ~1,400 |
| Runtime | ~45 minutes |
fitness = Con×0.25 + Dis×0.25 + Fid×0.20 + Rob×0.15 + Elonorm×0.15
Where Con = Consistency (same personality across contexts), Dis = Distinctiveness (different from siblings), Fid = Fidelity (behavior matches genome prediction), Rob = Robustness (survives adversarial prompts), and Elonorm = normalized Elo rating from head-to-head tournament.