Predict Functions¶
High-level prediction functions that handle model loading, caching, and inference in a single call. These are the simplest entry points for running predictions without manually managing model instances.
Quick Start¶
from molfun import predict_structure, predict_properties, predict_affinity
# Structure prediction
result = predict_structure("MKFLILLFNILCLFPVLAADNH...")
# Property prediction
props = predict_properties("MKFLILLFNILCLFPVLAADNH...", properties=["plddt", "disorder"])
# Binding affinity
affinity = predict_affinity(
protein_seq="MKFLILLFNILCLFPVLAADNH...",
ligand_sdf="ligand.sdf",
)
Functions¶
predict_structure¶
predict_structure ¶
Predict 3D structure from an amino acid sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence
|
str
|
Amino acid sequence (e.g. "MKWVTFISLLLLFSSAYS"). |
required |
backend
|
str
|
Model backend ("openfold"). |
'openfold'
|
device
|
str
|
"cpu" or "cuda". |
'cpu'
|
Returns:
| Type | Description |
|---|---|
dict
|
Dict with keys: |
dict
|
|
dict
|
|
dict
|
|
dict
|
|
dict
|
|
Usage::
result = predict_structure("MKWVTFISLLLLFSSAYS", device="cuda")
print(result["plddt"])
with open("pred.pdb", "w") as f:
f.write(result["pdb_string"])
Predict the 3D structure of a protein from its amino acid sequence.
from molfun import predict_structure
result = predict_structure(
sequence="MKFLILLFNILCLFPVLAADNH...",
model="openfold_v2",
num_recycles=3,
device="cuda",
)
# Access outputs
coords = result.positions # (N_residues, 37, 3) atom coordinates
plddt = result.plddt # (N_residues,) per-residue confidence
pae = result.pae # (N_residues, N_residues) predicted aligned error
| Parameter | Type | Default | Description |
|---|---|---|---|
sequence |
str |
required | Amino acid sequence (one-letter codes) |
model |
str |
"openfold_v2" |
Pretrained model name |
num_recycles |
int |
3 |
Number of recycling iterations |
msa |
str \| None |
None |
Path to A3M MSA file |
device |
str |
"cpu" |
Compute device |
dtype |
torch.dtype |
torch.float32 |
Model precision |
Returns: TrunkOutput with .positions, .plddt, .pae.
predict_properties¶
predict_properties ¶
predict_properties(sequence: str, properties: list[str] | None = None, backend: str = 'openfold', device: str = 'cpu') -> dict
Predict biophysical properties from an amino acid sequence.
Two categories of properties:
-
Sequence-based (no GPU needed): molecular_weight, isoelectric_point, hydrophobicity, aromaticity, charge, instability_index. Computed analytically from the amino acid composition.
-
Embedding-based (uses structure model): stability, solubility, expression, immunogenicity, aggregation, thermostability. Derived from backbone embeddings via learned projections.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence
|
str
|
Amino acid sequence. |
required |
properties
|
list[str] | None
|
Which properties to predict. If None, predicts all sequence-based properties. |
None
|
backend
|
str
|
Model backend for embedding-based properties. |
'openfold'
|
device
|
str
|
"cpu" or "cuda". |
'cpu'
|
Returns:
| Type | Description |
|---|---|
dict
|
Dict mapping property names to float scores. |
Usage::
props = predict_properties("MKWVTFISLLLLFSSAYS")
print(props["molecular_weight"])
props = predict_properties("MKWVTFISLLLLFSSAYS", ["stability", "solubility"])
Predict per-residue or global protein properties.
from molfun import predict_properties
props = predict_properties(
sequence="MKFLILLFNILCLFPVLAADNH...",
properties=["plddt", "disorder", "secondary_structure"],
model="openfold_v2",
)
plddt = props["plddt"] # (N_residues,)
ss = props["secondary_structure"] # (N_residues,) H/E/C labels
| Parameter | Type | Default | Description |
|---|---|---|---|
sequence |
str |
required | Amino acid sequence |
properties |
list[str] |
["plddt"] |
Properties to predict |
model |
str |
"openfold_v2" |
Pretrained model name |
device |
str |
"cpu" |
Compute device |
Returns: dict[str, Tensor] mapping property names to tensors.
predict_affinity¶
predict_affinity ¶
predict_affinity(sequence: str, ligand_smiles: str, backend: str = 'openfold', device: str = 'cpu', single_dim: int = 384) -> dict
Predict binding affinity between a protein and a small molecule.
Uses the structure model's single representation pooled over residues, passed through a trained AffinityHead. The ligand SMILES is stored in the result for reference but does not yet influence the prediction (protein-only model — ligand-aware scoring is planned).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence
|
str
|
Protein amino acid sequence. |
required |
ligand_smiles
|
str
|
SMILES string of the ligand. |
required |
backend
|
str
|
Model backend. |
'openfold'
|
device
|
str
|
"cpu" or "cuda". |
'cpu'
|
single_dim
|
int
|
Dimension of the single representation (must match the pretrained model). |
384
|
Returns:
| Type | Description |
|---|---|
dict
|
Dict with keys: |
dict
|
|
dict
|
|
dict
|
|
dict
|
|
Usage::
result = predict_affinity(
"MKWVTFISLLLLFSSAYS",
ligand_smiles="CC(=O)O",
device="cuda",
)
print(f"ΔG = {result['binding_affinity_kcal']:.1f} kcal/mol")
Predict binding affinity between a protein and a ligand.
from molfun import predict_affinity
result = predict_affinity(
protein_seq="MKFLILLFNILCLFPVLAADNH...",
ligand_sdf="ligand.sdf",
model="openfold_v2",
)
print(f"Predicted pKd: {result.pkd:.2f}")
print(f"Confidence: {result.confidence:.2f}")
| Parameter | Type | Default | Description |
|---|---|---|---|
protein_seq |
str |
required | Protein amino acid sequence |
ligand_sdf |
str |
required | Path to ligand SDF file |
model |
str |
"openfold_v2" |
Pretrained model name |
device |
str |
"cpu" |
Compute device |
Returns: Result object with .pkd (predicted pKd) and .confidence.
clear_cache¶
Clear the internal model cache used by the predict functions.
This is useful when switching between different models or when GPU memory is limited. The next call to any predict function will reload the model from disk.