Skip to content

Predict Functions

High-level prediction functions that handle model loading, caching, and inference in a single call. These are the simplest entry points for running predictions without manually managing model instances.

Quick Start

from molfun import predict_structure, predict_properties, predict_affinity

# Structure prediction
result = predict_structure("MKFLILLFNILCLFPVLAADNH...")

# Property prediction
props = predict_properties("MKFLILLFNILCLFPVLAADNH...", properties=["plddt", "disorder"])

# Binding affinity
affinity = predict_affinity(
    protein_seq="MKFLILLFNILCLFPVLAADNH...",
    ligand_sdf="ligand.sdf",
)

Functions

predict_structure

predict_structure

predict_structure(sequence: str, backend: str = 'openfold', device: str = 'cpu') -> dict

Predict 3D structure from an amino acid sequence.

Parameters:

Name Type Description Default
sequence str

Amino acid sequence (e.g. "MKWVTFISLLLLFSSAYS").

required
backend str

Model backend ("openfold").

'openfold'
device str

"cpu" or "cuda".

'cpu'

Returns:

Type Description
dict

Dict with keys:

dict
  • "coordinates" — list of [x, y, z] per residue (CA atoms)
dict
  • "plddt" — list of per-residue confidence scores (0–1)
dict
  • "pdb_string" — PDB-format string for visualization
dict
  • "sequence" — input sequence
dict
  • "length" — sequence length

Usage::

result = predict_structure("MKWVTFISLLLLFSSAYS", device="cuda")
print(result["plddt"])
with open("pred.pdb", "w") as f:
    f.write(result["pdb_string"])

Predict the 3D structure of a protein from its amino acid sequence.

from molfun import predict_structure

result = predict_structure(
    sequence="MKFLILLFNILCLFPVLAADNH...",
    model="openfold_v2",
    num_recycles=3,
    device="cuda",
)

# Access outputs
coords = result.positions   # (N_residues, 37, 3) atom coordinates
plddt  = result.plddt       # (N_residues,) per-residue confidence
pae    = result.pae         # (N_residues, N_residues) predicted aligned error
Parameter Type Default Description
sequence str required Amino acid sequence (one-letter codes)
model str "openfold_v2" Pretrained model name
num_recycles int 3 Number of recycling iterations
msa str \| None None Path to A3M MSA file
device str "cpu" Compute device
dtype torch.dtype torch.float32 Model precision

Returns: TrunkOutput with .positions, .plddt, .pae.


predict_properties

predict_properties

predict_properties(sequence: str, properties: list[str] | None = None, backend: str = 'openfold', device: str = 'cpu') -> dict

Predict biophysical properties from an amino acid sequence.

Two categories of properties:

  • Sequence-based (no GPU needed): molecular_weight, isoelectric_point, hydrophobicity, aromaticity, charge, instability_index. Computed analytically from the amino acid composition.

  • Embedding-based (uses structure model): stability, solubility, expression, immunogenicity, aggregation, thermostability. Derived from backbone embeddings via learned projections.

Parameters:

Name Type Description Default
sequence str

Amino acid sequence.

required
properties list[str] | None

Which properties to predict. If None, predicts all sequence-based properties.

None
backend str

Model backend for embedding-based properties.

'openfold'
device str

"cpu" or "cuda".

'cpu'

Returns:

Type Description
dict

Dict mapping property names to float scores.

Usage::

props = predict_properties("MKWVTFISLLLLFSSAYS")
print(props["molecular_weight"])

props = predict_properties("MKWVTFISLLLLFSSAYS", ["stability", "solubility"])

Predict per-residue or global protein properties.

from molfun import predict_properties

props = predict_properties(
    sequence="MKFLILLFNILCLFPVLAADNH...",
    properties=["plddt", "disorder", "secondary_structure"],
    model="openfold_v2",
)

plddt = props["plddt"]          # (N_residues,)
ss    = props["secondary_structure"]  # (N_residues,) H/E/C labels
Parameter Type Default Description
sequence str required Amino acid sequence
properties list[str] ["plddt"] Properties to predict
model str "openfold_v2" Pretrained model name
device str "cpu" Compute device

Returns: dict[str, Tensor] mapping property names to tensors.


predict_affinity

predict_affinity

predict_affinity(sequence: str, ligand_smiles: str, backend: str = 'openfold', device: str = 'cpu', single_dim: int = 384) -> dict

Predict binding affinity between a protein and a small molecule.

Uses the structure model's single representation pooled over residues, passed through a trained AffinityHead. The ligand SMILES is stored in the result for reference but does not yet influence the prediction (protein-only model — ligand-aware scoring is planned).

Parameters:

Name Type Description Default
sequence str

Protein amino acid sequence.

required
ligand_smiles str

SMILES string of the ligand.

required
backend str

Model backend.

'openfold'
device str

"cpu" or "cuda".

'cpu'
single_dim int

Dimension of the single representation (must match the pretrained model).

384

Returns:

Type Description
dict

Dict with keys:

dict
  • "binding_affinity_kcal" — predicted ΔG in kcal/mol
dict
  • "confidence" — confidence score (0–1) based on pLDDT
dict
  • "ligand_smiles" — input SMILES (echo)
dict
  • "sequence_length" — protein length

Usage::

result = predict_affinity(
    "MKWVTFISLLLLFSSAYS",
    ligand_smiles="CC(=O)O",
    device="cuda",
)
print(f"ΔG = {result['binding_affinity_kcal']:.1f} kcal/mol")

Predict binding affinity between a protein and a ligand.

from molfun import predict_affinity

result = predict_affinity(
    protein_seq="MKFLILLFNILCLFPVLAADNH...",
    ligand_sdf="ligand.sdf",
    model="openfold_v2",
)

print(f"Predicted pKd: {result.pkd:.2f}")
print(f"Confidence: {result.confidence:.2f}")
Parameter Type Default Description
protein_seq str required Protein amino acid sequence
ligand_sdf str required Path to ligand SDF file
model str "openfold_v2" Pretrained model name
device str "cpu" Compute device

Returns: Result object with .pkd (predicted pKd) and .confidence.


clear_cache

clear_cache

clear_cache() -> None

Free all cached models (releases GPU memory).

Clear the internal model cache used by the predict functions.

from molfun.predict import clear_cache

# Free memory by releasing cached models
clear_cache()

This is useful when switching between different models or when GPU memory is limited. The next call to any predict function will reload the model from disk.