Predict Functions¶

High-level prediction functions that handle model loading, caching, and inference in a single call. These are the simplest entry points for running predictions without manually managing model instances.

Quick Start¶

from molfun import predict_structure, predict_properties, predict_affinity

# Structure prediction
result = predict_structure("MKFLILLFNILCLFPVLAADNH...")

# Property prediction
props = predict_properties("MKFLILLFNILCLFPVLAADNH...", properties=["plddt", "disorder"])

# Binding affinity
affinity = predict_affinity(
    protein_seq="MKFLILLFNILCLFPVLAADNH...",
    ligand_sdf="ligand.sdf",
)

Functions¶

predict_structure¶

predict_structure ¶

predict_structure(sequence: str, backend: str = 'openfold', device: str = 'cpu') -> dict

Predict 3D structure from an amino acid sequence.

Parameters:

Name	Type	Description	Default
`sequence`	`str`	Amino acid sequence (e.g. "MKWVTFISLLLLFSSAYS").	required
`backend`	`str`	Model backend ("openfold").	`'openfold'`
`device`	`str`	"cpu" or "cuda".	`'cpu'`

Returns:

Type	Description
`dict`	Dict with keys:
`dict`	`"coordinates"` — list of [x, y, z] per residue (CA atoms)
`dict`	`"plddt"` — list of per-residue confidence scores (0–1)
`dict`	`"pdb_string"` — PDB-format string for visualization
`dict`	`"sequence"` — input sequence
`dict`	`"length"` — sequence length

Usage::

result = predict_structure("MKWVTFISLLLLFSSAYS", device="cuda")
print(result["plddt"])
with open("pred.pdb", "w") as f:
    f.write(result["pdb_string"])

Predict the 3D structure of a protein from its amino acid sequence.

from molfun import predict_structure

result = predict_structure(
    sequence="MKFLILLFNILCLFPVLAADNH...",
    model="openfold_v2",
    num_recycles=3,
    device="cuda",
)

# Access outputs
coords = result.positions   # (N_residues, 37, 3) atom coordinates
plddt  = result.plddt       # (N_residues,) per-residue confidence
pae    = result.pae         # (N_residues, N_residues) predicted aligned error

Parameter	Type	Default	Description
`sequence`	`str`	required	Amino acid sequence (one-letter codes)
`model`	`str`	`"openfold_v2"`	Pretrained model name
`num_recycles`	`int`	`3`	Number of recycling iterations
`msa`	`str \\| None`	`None`	Path to A3M MSA file
`device`	`str`	`"cpu"`	Compute device
`dtype`	`torch.dtype`	`torch.float32`	Model precision

Returns: TrunkOutput with .positions, .plddt, .pae.

predict_properties¶

predict_properties ¶

predict_properties(sequence: str, properties: list[str] | None = None, backend: str = 'openfold', device: str = 'cpu') -> dict

Predict biophysical properties from an amino acid sequence.

Two categories of properties:

Sequence-based (no GPU needed): molecular_weight, isoelectric_point, hydrophobicity, aromaticity, charge, instability_index. Computed analytically from the amino acid composition.
Embedding-based (uses structure model): stability, solubility, expression, immunogenicity, aggregation, thermostability. Derived from backbone embeddings via learned projections.

Parameters:

Name	Type	Description	Default
`sequence`	`str`	Amino acid sequence.	required
`properties`	`list[str] \| None`	Which properties to predict. If None, predicts all sequence-based properties.	`None`
`backend`	`str`	Model backend for embedding-based properties.	`'openfold'`
`device`	`str`	"cpu" or "cuda".	`'cpu'`

Returns:

Type	Description
`dict`	Dict mapping property names to float scores.

Usage::

props = predict_properties("MKWVTFISLLLLFSSAYS")
print(props["molecular_weight"])

props = predict_properties("MKWVTFISLLLLFSSAYS", ["stability", "solubility"])

Predict per-residue or global protein properties.

from molfun import predict_properties

props = predict_properties(
    sequence="MKFLILLFNILCLFPVLAADNH...",
    properties=["plddt", "disorder", "secondary_structure"],
    model="openfold_v2",
)

plddt = props["plddt"]          # (N_residues,)
ss    = props["secondary_structure"]  # (N_residues,) H/E/C labels

Parameter	Type	Default	Description
`sequence`	`str`	required	Amino acid sequence
`properties`	`list[str]`	`["plddt"]`	Properties to predict
`model`	`str`	`"openfold_v2"`	Pretrained model name
`device`	`str`	`"cpu"`	Compute device

Returns: dict[str, Tensor] mapping property names to tensors.

predict_affinity¶

predict_affinity ¶

predict_affinity(sequence: str, ligand_smiles: str, backend: str = 'openfold', device: str = 'cpu', single_dim: int = 384) -> dict

Predict binding affinity between a protein and a small molecule.

Uses the structure model's single representation pooled over residues, passed through a trained AffinityHead. The ligand SMILES is stored in the result for reference but does not yet influence the prediction (protein-only model — ligand-aware scoring is planned).

Parameters:

Name	Type	Description	Default
`sequence`	`str`	Protein amino acid sequence.	required
`ligand_smiles`	`str`	SMILES string of the ligand.	required
`backend`	`str`	Model backend.	`'openfold'`
`device`	`str`	"cpu" or "cuda".	`'cpu'`
`single_dim`	`int`	Dimension of the single representation (must match the pretrained model).	`384`

Returns:

Type	Description
`dict`	Dict with keys:
`dict`	`"binding_affinity_kcal"` — predicted ΔG in kcal/mol
`dict`	`"confidence"` — confidence score (0–1) based on pLDDT
`dict`	`"ligand_smiles"` — input SMILES (echo)
`dict`	`"sequence_length"` — protein length

Usage::

result = predict_affinity(
    "MKWVTFISLLLLFSSAYS",
    ligand_smiles="CC(=O)O",
    device="cuda",
)
print(f"ΔG = {result['binding_affinity_kcal']:.1f} kcal/mol")

Predict binding affinity between a protein and a ligand.

from molfun import predict_affinity

result = predict_affinity(
    protein_seq="MKFLILLFNILCLFPVLAADNH...",
    ligand_sdf="ligand.sdf",
    model="openfold_v2",
)

print(f"Predicted pKd: {result.pkd:.2f}")
print(f"Confidence: {result.confidence:.2f}")

Parameter	Type	Default	Description
`protein_seq`	`str`	required	Protein amino acid sequence
`ligand_sdf`	`str`	required	Path to ligand SDF file
`model`	`str`	`"openfold_v2"`	Pretrained model name
`device`	`str`	`"cpu"`	Compute device

Returns: Result object with .pkd (predicted pKd) and .confidence.

clear_cache¶

clear_cache ¶

clear_cache() -> None

Free all cached models (releases GPU memory).

Clear the internal model cache used by the predict functions.

from molfun.predict import clear_cache

# Free memory by releasing cached models
clear_cache()

This is useful when switching between different models or when GPU memory is limited. The next call to any predict function will reload the model from disk.