Experiment Tracking¶

Beginner ~15 min

Track metrics, hyperparameters, and artifacts across training runs with Molfun's built-in integrations for Weights & Biases, Comet ML, and MLflow. Use CompositeTracker to log to multiple backends simultaneously.

What You Will Learn¶

Set up WandB, Comet, and MLflow trackers
Pass a tracker to model.fit()
Use CompositeTracker for multi-backend logging
Log custom metrics, configs, and artifacts
Best practices for experiment organization

Quick Start¶

Adding tracking to any Molfun training run is a single extra argument:

from molfun import MolfunStructureModel
from molfun.tracking import WandbTracker

model = MolfunStructureModel.from_pretrained("openfold_v1", device="cuda")

tracker = WandbTracker(project="my-protein-project")  # (1)!

model.fit(
    train_loader=train_loader,
    val_loader=val_loader,
    strategy=strategy,
    epochs=20,
    tracker=tracker,         # Just add this line
)

The tracker is initialized here but the run starts automatically when fit() is called. It ends when training completes.

Tracker Setup¶

Weights & BiasesComet MLMLflow

Install and authenticate:

pip install wandb
wandb login

from molfun.tracking import WandbTracker

tracker = WandbTracker(
    project="protein-stability",   # WandB project name
)

WandB features

WandB automatically logs:

Training and validation loss curves
Learning rate schedule
System metrics (GPU utilization, memory)
Model architecture summary

Install and set your API key:

pip install comet-ml
export COMET_API_KEY="your-api-key"

from molfun.tracking import CometTracker

tracker = CometTracker(
    project="protein-stability",   # Comet project name
)

Install and optionally set a tracking URI:

pip install mlflow

from molfun.tracking import MLflowTracker

tracker = MLflowTracker(
    experiment="protein-stability",  # MLflow experiment name
)

MLflow tracking server

By default, MLflow logs to a local ./mlruns directory. To use a remote tracking server:

import mlflow
mlflow.set_tracking_uri("http://your-mlflow-server:5000")

tracker = MLflowTracker(experiment="protein-stability")

CompositeTracker¶

Log to multiple backends simultaneously. Useful for teams where different members prefer different tools.

from molfun.tracking import WandbTracker, CometTracker, CompositeTracker

tracker = CompositeTracker([
    WandbTracker(project="protein-stability"),
    CometTracker(project="protein-stability"),
])

# Use exactly like a single tracker
model.fit(
    train_loader=train_loader,
    val_loader=val_loader,
    strategy=strategy,
    epochs=20,
    tracker=tracker,
)

All methods (log_metrics, log_config, log_artifact) are forwarded to every backend.

Manual Tracking API¶

Beyond automatic logging during fit(), you can use the tracker API directly for custom metrics and artifacts.

Log Metrics¶

tracker.start_run(run_name="lora-rank8-experiment")

# Log scalar metrics
tracker.log_metrics({
    "pearson_r": 0.85,
    "rmse": 1.23,
    "best_epoch": 12,
}, step=0)

# Log metrics at specific steps
for epoch in range(20):
    tracker.log_metrics({"custom/my_metric": compute_metric()}, step=epoch)

Log Configuration¶

tracker.log_config({
    "model": "openfold_v1",
    "strategy": "lora",
    "rank": 8,
    "alpha": 16.0,
    "lr_lora": 1e-4,
    "lr_head": 1e-3,
    "batch_size": 8,
    "max_seq_length": 512,
    "dataset_size": len(train_ds),
})

Log Artifacts¶

# Log a saved model
tracker.log_artifact("models/affinity_lora", name="trained-model")

# Log a plot
tracker.log_artifact("results/scatter_plot.png", name="evaluation-plot")

# Log text (predictions, notes)
tracker.log_text("Best run achieved r=0.85 with rank=8, alpha=16")

End a Run¶

tracker.end_run()

Comparing Runs¶

A common workflow is to sweep over hyperparameters and compare results.

from molfun import MolfunStructureModel
from molfun.training import LoRAFinetune
from molfun.tracking import WandbTracker

ranks = [4, 8, 16]
alphas = [8.0, 16.0, 32.0]

for rank in ranks:
    for alpha in alphas:
        tracker = WandbTracker(project="lora-sweep")
        tracker.start_run(run_name=f"rank{rank}-alpha{alpha}")

        tracker.log_config({
            "rank": rank,
            "alpha": alpha,
        })

        model = MolfunStructureModel.from_pretrained(
            "openfold_v1", device="cuda",
            head="affinity",
            head_config={"hidden_dim": 256, "num_layers": 2},
        )

        strategy = LoRAFinetune(
            rank=rank, alpha=alpha,
            lr_lora=1e-4, lr_head=1e-3,
        )

        model.fit(
            train_loader=train_loader,
            val_loader=val_loader,
            strategy=strategy,
            epochs=15,
            tracker=tracker,
        )

        # Log final metrics
        final_metrics = evaluate(model, test_loader)
        tracker.log_metrics(final_metrics)
        tracker.end_run()

Best Practices¶

Naming conventions

Use consistent, descriptive run names:

{strategy}-{key_param}-{dataset}-{date}

Examples: lora-r8-pdbbind-20260319, headonly-stability-v2

Log everything reproducible

Always log:

All hyperparameters (strategy params, LR, batch size)
Dataset metadata (size, split ratios, random seed)
Model configuration (head type, hidden dims)
Environment info (GPU type, PyTorch version)

Project organization

One project per task: stability-prediction, binding-affinity, etc.
Tags for grouping: Tag runs by strategy, dataset version, or experiment phase
Artifact versioning: Log model checkpoints as artifacts for reproducibility

Avoid

Logging too frequently (every batch) on large datasets --- log every N steps instead
Forgetting to call end_run() when using the manual API
Mixing unrelated experiments in the same project

Next Steps¶

Run your first tracked experiment: Try Stability Prediction with a tracker enabled.
Automate tracking in pipelines: See YAML Pipelines where tracking is configured declaratively.
Sweep hyperparameters: Combine tracking with the comparison pattern above for systematic experiments.