Design¶
hallmark follows the
Unix philosophy:
Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.
—Douglas McIlroy
The “one thing” for hallmark is maintaining a reproducible data
index.
With a well-designed indexing mechanism, it becomes natural to expose
a small set of core functions:
add/remove: find data from any source and bring them into the index;
index: compute checksums of data objects and index their relationships;
version control: append immutable records; and
view: emit manifests of subsets for other tools to consume.
Architecture¶
A hallmark repository is the entry point for a version-controlled
dataset index.
It has three architecture components with different responsibilities:
State: the canonical in-memory data container, where all index mutations happen (add/remove/index);Dothm: an on-disk version-controlled repository, persistingStateand providing immutable history (usinggit); andWorktree: an on-disk working tree/directory, where data files are discovered and consumed.
The data flow can be summarized as:
Repo
____________/\____________
/ \
State ---persist-------+
^ |
| |
| v
+----instantiate-- Dothm (".hm" git repository)
| ^
| |link
| v
+-----discover---- Worktree
|
|access
v
Other tools
API¶
Currently, hallmark has two built-in APIs:
Python API: the native API that all
hallmarkfeatures are implemented in.Stateis the active object during the process lifetime.DothmandWorktreeare optional depending on workflow (for example, in-memory workflows may omit both).CLI: python features wrapped by
click. Eachhallmark ...command loadsStatefrom a discoveredDothmrepository, executes the requested operation, then writes staged state updates back toDothmbefore exit. In this mode,Stateis short-lived andDothmis required.
Worktree follows the same idea as
git worktree:
multiple repo/.hm directories can be attached to one underlying
repository to support parallel data transformations and
branch-isolated workflows.
Repository Repo¶
hallmark supports three repository forms with the same internal
Dothm data model:
standard repository::
+--- Worktree v "repo/.hm/" <--- Dothm, a standard git repo
bare repository::
"repo.hm/" <--- Dothm, a standard or bare git repo
shared repository (multiple worktrees)::
+--- Worktree git repo v ^ "repo1/.hm/" <--- Dothm, a git worktree --+ | +--- Worktree | v | "repo2/.hm/" <--- Dothm, a git worktree --+ | +--- Worktree | v | "repo3/.hm/" <--- Dothm, a git worktree --+