System Overview¶
Logical layers¶
Molfun maps to three logical layers without physically restructuring the directory tree into domain/, application/, infrastructure/ folders. The current package layout is already clear enough.
| Layer | Responsibility | Packages |
|---|---|---|
| Domain | Core types, constants, abstract interfaces | core/ (TrunkOutput, Batch), constants.py, ABCs in adapters/base, training/base, tracking/base, losses/base, modules/*/base, modules/registry |
| Application | Orchestration, business logic, facades | models/structure.py (MolfunStructureModel), predict.py, training/ strategies, pipelines/, benchmarks/, analysis/ |
| Infrastructure | External integrations, I/O, CLI | backends/openfold/, data/, tracking/ implementations, storage/, hub/, export/, cli/, kernels/, cache/ |
Why no physical DDD restructure?
The current package layout (adapters/, modules/, training/, data/, etc.) already communicates boundaries clearly. A physical move into domain/application/infrastructure folders would break every import in the ecosystem for minimal gain.
System architecture¶
graph TB
subgraph Facade
MSM["MolfunStructureModel"]
end
subgraph Application
predict["predict.py"]
strategies["Training Strategies"]
pipelines["Pipelines"]
benchmarks["Benchmarks"]
analysis["Analysis"]
end
subgraph Domain
core["core/ (TrunkOutput, Batch)"]
abcs["ABCs (BaseAdapter, FinetuneStrategy, ...)"]
registries["ModuleRegistry"]
losses["LossFunction + LossRegistry"]
end
subgraph Infrastructure
openfold["backends/openfold/"]
data["data/ (sources, parsers, datasets)"]
trackers["tracking/ (wandb, comet, mlflow, langfuse)"]
storage["storage/ (local, S3, MinIO)"]
cli["cli/ (Typer)"]
hub["hub/ + export/"]
kernels["kernels/ (CUDA)"]
cache["cache/"]
end
MSM --> predict
MSM --> strategies
MSM --> openfold
MSM --> data
MSM --> hub
predict --> core
strategies --> abcs
strategies --> losses
strategies --> trackers
openfold --> abcs
openfold --> registries
data --> storage
pipelines --> data
pipelines --> strategies
benchmarks --> predict
analysis --> core
Request flow: predict call¶
A typical prediction flows through three stages -- embed, process, fold:
sequenceDiagram
participant User
participant MSM as MolfunStructureModel
participant Adapter as BaseAdapter
participant Emb as Embedder
participant Blocks as Trunk Blocks (x N)
participant SM as Structure Module
participant Head as Task Head
User->>MSM: predict(batch)
MSM->>Adapter: forward(batch)
Adapter->>Emb: forward(aatype, residue_index, msa)
Emb-->>Adapter: EmbedderOutput(single, pair)
loop N blocks
Adapter->>Blocks: forward(single, pair)
Blocks-->>Adapter: BlockOutput(single, pair)
end
Adapter->>SM: forward(single, pair, aatype)
SM-->>Adapter: StructureModuleOutput(positions, frames, confidence)
Adapter-->>MSM: TrunkOutput(single_repr, pair_repr, structure_coords, confidence)
alt Has task head
MSM->>Head: forward(trunk_output)
Head-->>MSM: predictions
end
MSM-->>User: result dict
Key design decisions¶
Lazy imports for heavy backends¶
OpenFold, ESM, and tracking libraries (wandb, comet) are imported only when actually used. This keeps import molfun fast and avoids forcing users to install backends they do not need.
# molfun/models/structure.py
def _register_adapters():
"""Lazy registration to avoid import errors for uninstalled backends."""
if ADAPTER_REGISTRY:
return
from molfun.backends.openfold.adapter import OpenFoldAdapter
ADAPTER_REGISTRY["openfold"] = OpenFoldAdapter
Facade pattern at the top¶
MolfunStructureModel is the single entry point. Users never interact with adapters, registries, or strategies directly unless they want to. This keeps the 80% use case simple while leaving full power accessible.
Adapters normalize backends¶
Every model backend (OpenFold, future ESMFold, future Protenix) is wrapped in a BaseAdapter subclass. Training strategies, heads, and losses program against the adapter interface and work with any backend without modification.