Git in 2026: an architecture note on objects, refs, packfiles, and why reachability is the real durability contract

Git feels familiar at the UI layer, but its lasting strength comes from a much lower-level design: content-addressed objects, simple movable refs, and conservative garbage collection around reachability.

Most Git mistakes are not merge-conflict mistakes. They are storage-model mistakes.

Teams use branches, pull requests, and force-pushes every day, then still talk about Git as if it were primarily a hosted collaboration product with a nicer command line attached. The official documentation describes something more concrete: Git stores snapshots, names them through refs, and protects integrity through content-addressed objects.[1][2][3] Once that clicks, a lot of everyday confusion falls away.

The practical point is simple. Git is safest when you understand what the repository is actually promising to keep, what it is free to garbage-collect later, and why the difference between those two states is reachability.[2][3][6]

1. Objects are the repository; the working tree is just one view

Git's core data model is built from a small set of object types: blobs, trees, commits, and annotated tags.[2] A blob stores file contents. A tree stores directory structure plus mode/name/object links. A commit stores metadata, a pointer to a top-level tree, and usually one or more parent commits.[2]

That means Git is not fundamentally storing "a diff per commit" as its primary identity. The official "What is Git?" chapter frames Git as a snapshot-oriented system, and the internals chapter shows why: a commit anchors a whole project state by pointing at a root tree, not by re-running a patch script every time history is read.[1][2]

This matters operationally because it explains why Git can recover so much context from object identity alone. If the object graph is intact, Git can reconstruct history, tree state, and file content relationships even when a user's mental model is vague. The repository is not a bag of branch tips with patches hanging off them. It is an object database with named entry points.[2][3]

2. Branches are refs, not containers

The next useful correction is that a branch is not a folder that "contains commits." A branch is a ref: a name whose value is usually the object ID of the commit at the tip of that line of development.[3] Move the ref, and the visible branch tip moves with it. The commits themselves do not relocate.

The repository layout documentation makes this more concrete. Refs often live under .git/refs/, while older or less frequently updated refs may be consolidated into packed-refs for efficiency.[4] HEAD is commonly a symbolic ref pointing at the current branch tip rather than a special separate history mechanism.[3][4]

This is why rebasing and force-pushing are often misunderstood. Rewriting history does not mutate some secret "inside" of a branch. It creates or reuses commits, then repoints refs to different commit IDs.[3] The dangerous part is not mystical branch corruption. The dangerous part is that once refs stop pointing at older commits, those older commits lose one of their main paths to continued reachability.[6]

3. Loose objects are how work begins; packfiles are how repositories stay economical

Git's object model would be too expensive if every object remained loose forever. The packfiles chapter explains the answer: Git can combine many objects into packfiles and compress them, often using delta relationships to reduce storage and transfer size.[5]

This changes repository economics in two ways.

First, it keeps ordinary development fast. New objects can be created incrementally as loose objects while work is active, without paying the cost of constant full repacking.[2][5] Second, it keeps long-lived repositories transportable. Packfiles are one reason cloning and fetching large histories remain feasible even when the raw object count is high.[5]

The mistake to avoid is treating packfiles as an implementation detail with no behavioral consequence. They do not change Git's identity model, but they do explain why repository maintenance operations such as git gc and repack matter on large or long-running repos.[5][6] Storage shape and history shape stay related.

4. Reachability is the real durability contract

Git's strongest safety property is not "your branch exists" but "the object is still reachable from something Git protects."[6]

The git gc documentation says the collector tries hard to preserve objects referenced from branches, tags, the index, remote-tracking branches, reflogs, and other object references in the repository.[6] That sentence is the real operating contract. Objects are durable when they remain connected to protected roots. They become prune candidates when those connections disappear and aging rules are satisfied.[6]

This is the cleanest way to understand common recovery stories. A commit that vanished from a branch after a reset or rebase may still be recoverable because another ref or the reflog still points to it. A commit that nobody can name anymore is living on borrowed time. Git is conservative, but it is not a promise to keep unreachable objects forever.[6]

In practice, that means refs are governance, not just convenience. A lightweight tag before a risky rewrite, a temporary branch before a history edit, or a published remote ref before local cleanup all extend reachability in explicit ways.[3][6] Teams that understand this make fewer panicked recovery moves because they know exactly what safety boundary they are trying to preserve.

5. What this changes for day-to-day engineering

Once you adopt the object-and-reachability view, several habits become easier to justify:

Create a branch or tag before destructive history surgery, because names are how you preserve roots.[3][6]
Treat reflog recovery as a local safety net, not as a collaboration contract, because reflogs are repository-local and time-bounded.[6]
Read force-pushes as ref movement plus social coordination cost, not as magical deletion across every clone.[3][6]
Separate hosting workflow from repository mechanics; pull requests live above Git's storage model, not inside it.[1][2][3]

That last point is especially useful in platform teams. Many workflow arguments are really arguments about policy on top of refs and commits. Git itself is much smaller and stricter than the surrounding collaboration layer makes it appear.

Bottom line

Git remains powerful because its core is narrow: content-addressed objects, simple named refs, efficient packing, and a garbage collector organized around reachability.[2][3][5][6] If you understand those four pieces, many of the repository actions that feel risky at the UI layer become legible again.

The day-to-day payoff is not philosophical clarity. It is fewer accidental history losses, calmer rewrite workflows, and better judgment about which names in a repository are merely convenient and which ones are actually keeping data alive.

cronfeed.work