14 months of conversations.
One build.
1,871 conversations across Claude and ChatGPT. 14 months. One pipeline definition. One build.
The input
Hundreds of conversations across two platforms. No structure, no organization. Technical discussions, personal reflections, project planning, research threads — everything mixed together in raw chat export format.
What the raw exports look like
The two platforms use completely different structures. ChatGPT exports conversations as a tree (a DAG of messages with parent/child pointers to support branching). Claude exports a flat message array. Synix auto-detects the format at Layer 0 and parses both into a uniform transcript representation.
// ChatGPT export — 1,063 conversations
// Tree structure: each message has parent + children UUIDs
[
{
"title": "Event Details Extraction",
"conversation_id": "6984012a-9078-...",
"create_time": 1770258736.638,
"mapping": {
"<message-uuid>": {
"message": {
"author": { "role": "user" },
"content": {
"content_type": "user_editable_context",
"parts": ["..."]
}
},
"parent": "1bf4ad69-...",
"children": ["85e5fd69-..."]
}
}
}
]
// Claude export — 808 conversations
// Flat array: messages in sequence
[
{
"uuid": "3df46a88-dd8f-...",
"name": "Restart Mouse Service on macOS",
"created_at": "2025-05-27T08:25:06Z",
"chat_messages": [
{
"text": "is there some terminal command to...",
"sender": "human",
"created_at": "2025-05-28T01:51:15Z"
}
]
}
]
Two formats, two platforms, 1,871 total conversations. After Layer 0 parsing, every conversation is a uniform transcript artifact with the same schema — ready for the episode summarizer, regardless of where it came from.
The pipeline
A standard Synix pipeline definition. Four layers, two projections, validators enabled. This is the architecture you get from template 01-chatbot-export-synthesis.
from synix import Pipeline, Layer, Projection, ValidatorDecl
pipeline = Pipeline("personal-memory")
pipeline.source_dir = "./exports"
# Layer 0: parse raw exports from both platforms
pipeline.add_layer(Layer(name="transcripts", level=0, transform="parse"))
# Layer 1: one episode summary per conversation
pipeline.add_layer(Layer(
name="episodes", level=1, depends_on=["transcripts"],
transform="episode_summary", grouping="by_conversation",
))
# Layer 2: group episodes by month
pipeline.add_layer(Layer(
name="monthly", level=2, depends_on=["episodes"],
transform="monthly_rollup", grouping="by_month",
))
# Layer 3: synthesize everything into core memory
pipeline.add_layer(Layer(
name="core", level=3, depends_on=["monthly"],
transform="core_synthesis", grouping="single",
context_budget=10000,
))
# Projections
pipeline.add_projection(Projection(
name="search", projection_type="search_index",
sources=[
{"layer": "episodes", "search": ["fulltext"]},
{"layer": "monthly", "search": ["fulltext"]},
{"layer": "core", "search": ["fulltext"]},
],
))
pipeline.add_projection(Projection(
name="context-doc", projection_type="flat_file",
sources=[{"layer": "core"}],
config={"output_path": "./build/context.md"},
))
What each layer produces
Every layer in the pipeline produces typed, tracked artifacts. Here’s what the real output looks like at each altitude — from a single conversation up through the final synthesis. These are actual artifacts from this build, with personal details redacted.
Layer 1 — Episode summary
One artifact per conversation. The episode summary captures what was discussed, what was decided, and what the emotional register was — not a transcript, but a structured distillation.
One conversation in, one episode artifact out. The summary captures what was discussed, what was decided, and the emotional register — not a transcript, but a structured distillation. Under the hood, the artifact stores its content hash, the hash of the source transcript it was built from, the versioned prompt ID, and the full model config. Change any component and the episode rebuilds.
Layer 2 — Monthly rollup
Episodes are grouped by month and synthesized into a single rollup artifact. This one consumed 50 episode summaries from June 2025 to produce a structured overview of themes, evolution, and key decisions.
Fifty conversations distilled into one structured document. The rollup captures themes, evolution, and the interplay between technical work and personal context that a simple keyword search would never surface. Under the hood, the artifact’s input_ids array contains the content hashes of all 50 episode artifacts. If any episode changes — because you edited a transcript or changed the summarization prompt — the rollup rebuilds. If none changed, it’s a cache hit.
Layer 3 — Core memory
All monthly rollups are synthesized into a single core memory document. This is what an agent would consume — a structured understanding of the user derived from the full conversation history across both platforms.
Five sections. Identity, current focus, preferences, temporal history, active threads. All derived — nothing manually written. You can trace any claim back through the layers: the June 2025 monthly rollup identifies “Reflection on Place and Identity” as a major theme because dozens of episode summaries that month discussed neighborhood fit, geographic tension, and belonging. Those episodes were built from individual conversations. The core memory distills all of that into one line: “Geographically mobile, currently focused on urban living in the Bay Area.” One sentence, backed by a dependency chain you can walk from top to bottom.
Provenance
Every claim in the core memory document traces back through the pipeline to the source conversations that produced it. This is not git history — it’s a content-addressed dependency chain through every transform.
# Trace any artifact back to its sources
$ uvx synix lineage core-memory
# core-memory (sha256:a8f3c912...)
# ← monthly-2025-06 (sha256:f2c5823e...) [50 episodes]
# ← ep-67f30e91... "Neighborhood architecture and fit"
# ← transcript-67f30e91... (source, 2025-04-06)
# ← ep-681a22f0... "Agent memory versioning approaches"
# ← transcript-681a22f0... (source, 2025-06-12)
# ← ep-683bc1a4... "Preceptor architecture design"
# ← transcript-683bc1a4... (source, 2025-06-28)
# ← monthly-2025-09 (sha256:b7d41e03...) [38 episodes]
# ← ep-6912a4c1... "First build system prototype"
# ← transcript-6912a4c1... (source, 2025-09-15)
# ← monthly-2025-12 (sha256:e1a9f720...) [45 episodes]
# ← ...
This is what separates a build system from a memory store. The output isn’t a black box. Every artifact has a full dependency chain with content hashes at every level. Change the pipeline, run synix plan --explain-cache, and see exactly what rebuilds and why.
What this demonstrates
Cross-platform synthesis
Two different AI platforms, one coherent memory. Source format is an input detail, not an architectural constraint.
Declarative architecture
The pipeline definition is 30 lines of Python. The output is a structured document with full provenance. The developer declares; the system builds.
Artifacts at every altitude
Episode summaries, monthly rollups, core memory — each layer is inspectable, searchable, and independently cacheable.
Full provenance
Every claim in the output traces back to source conversations through every intermediate transform. Content-addressed dependency chains, not git commits.
Fingerprint-based caching
Each artifact stores a build fingerprint — inputs, prompt version, model config, transform source. Change any component and only affected artifacts rebuild.
Architecture evolution
Swap monthly rollups for topic-based clustering. Transcripts and episodes stay cached. Only downstream layers rebuild. No migration, no starting over.
Try it yourself
This case study used the 01-chatbot-export-synthesis pipeline template. You can run the same build on your own conversation exports.
# Install and scaffold
$ uvx synix init -t 01-chatbot-export-synthesis my-memory
$ cd my-memory
# Drop in your exports
# ChatGPT: Settings → Data Controls → Export Data
# Claude: Settings → Account → Export Data
$ cp ~/Downloads/conversations.json ./exports/
$ cp ~/Downloads/claude-export.json ./exports/
# Build
$ uvx synix build
# Browse what was built
$ uvx synix list
$ uvx synix show core-memory
$ uvx synix lineage core-memory
# Search across all altitudes
$ uvx synix search "agent memory"
# See what would rebuild if you changed the pipeline
$ uvx synix plan --explain-cache
Synix auto-detects ChatGPT and Claude export formats at Layer 0. Drop the files in, build, and you’ll have the same layered output — episodes, rollups, core memory, search index — with full provenance and caching from the first run.
Start building
Declare your memory architecture. Build it. Change it.
uvx synix init -t 01-chatbot-export-synthesis
View on GitHub