Docking & Scoring Integration — Design Document¶

Overview¶

Molecular docking predicts how a small molecule (ligand) binds to a protein target. This module integrates open-source tools to provide a complete docking workflow within the ChemLib Platform:

AutoDock Vina — primary docking engine (via Python API)
meeko — ligand preparation (SMILES/RDKit Mol to PDBQT)
PDBFixer — protein preparation (fix missing atoms, add hydrogens)
Open Babel — format conversion (PDB to PDBQT for receptor)
Fpocket — binding site detection (covered in Protein Target Library doc)
PLIP — protein-ligand interaction profiling (H-bonds, hydrophobic, pi-stacking)

Docking Workflow¶

Click diagram to zoom and pan:

The complete docking workflow from raw inputs to analyzed results:

┌──────────────────────────────────────────────────────────────────────────┐
│                        DOCKING WORKFLOW                                    │
│                                                                           │
│  PROTEIN SIDE                           LIGAND SIDE                       │
│  ──────────────                         ────────────                      │
│                                                                           │
│  PDB Structure                          SMILES / SDF                      │
│       │                                      │                            │
│       ▼                                      ▼                            │
│  ┌─────────────┐                       ┌──────────────┐                  │
│  │  PDBFixer   │                       │  RDKit Parse │                  │
│  │  - Fix gaps │                       │  - AddHs     │                  │
│  │  - Add H    │                       │  - 3D embed  │                  │
│  │  - Clean    │                       │  - MMFF94 min│                  │
│  └──────┬──────┘                       └──────┬───────┘                  │
│         │                                     │                           │
│         ▼                                     ▼                           │
│  ┌─────────────┐                       ┌──────────────┐                  │
│  │ PDB → PDBQT │                       │ Mol → PDBQT  │                  │
│  │ (Open Babel │                       │ (meeko)      │                  │
│  │  or ADFR)   │                       │              │                  │
│  └──────┬──────┘                       └──────┬───────┘                  │
│         │                                     │                           │
│         ▼                                     ▼                           │
│  ┌─────────────┐   ┌────────────┐     ┌──────────────┐                  │
│  │  Receptor   │   │  Binding   │     │   Ligand     │                  │
│  │  PDBQT      │──▶│  Site      │◀────│   PDBQT      │                  │
│  └─────────────┘   │  (center,  │     └──────────────┘                  │
│                     │   box_size)│                                        │
│                     └─────┬──────┘                                        │
│                           │                                               │
│                           ▼                                               │
│                     ┌────────────┐                                        │
│                     │ AutoDock   │                                        │
│                     │ Vina       │                                        │
│                     │ - dock()   │                                        │
│                     └─────┬──────┘                                        │
│                           │                                               │
│                           ▼                                               │
│                     ┌────────────┐                                        │
│                     │ Ranked     │                                        │
│                     │ Poses      │                                        │
│                     │ (PDBQT)    │                                        │
│                     └─────┬──────┘                                        │
│                           │                                               │
│                           ▼                                               │
│                     ┌────────────┐                                        │
│                     │ PLIP       │                                        │
│                     │ Interaction│                                        │
│                     │ Analysis   │                                        │
│                     └─────┬──────┘                                        │
│                           │                                               │
│                           ▼                                               │
│                     ┌────────────┐                                        │
│                     │ Results    │                                        │
│                     │ - Scores   │                                        │
│                     │ - Poses    │                                        │
│                     │ - Contacts │                                        │
│                     └────────────┘                                        │
└──────────────────────────────────────────────────────────────────────────┘

Step-by-Step Detail¶

Step 1: Protein Preparation¶

# Input: Raw PDB structure data (from ProteinStructure.structure_data)
# Output: Cleaned PDB string, ready for conversion

# 1. Fix missing atoms and residues
from pdbfixer import PDBFixer
from openmm.app import PDBFile
import io

fixer = PDBFixer(pdbfile=io.StringIO(pdb_data))
fixer.findMissingResidues()
fixer.findMissingAtoms()
fixer.addMissingAtoms()
fixer.removeHeterogens(keepWater=False)
fixer.addHydrogens(pH=7.0)

output = io.StringIO()
PDBFile.writeFile(fixer.topology, fixer.positions, output)
fixed_pdb = output.getvalue()

Step 2: Receptor PDBQT Conversion¶

# Input: Fixed PDB string
# Output: PDBQT string (with partial charges and atom types)

# Option A: Open Babel (simpler)
import subprocess
# Write fixed PDB to temp file, convert with obabel
# obabel -ipdb receptor.pdb -opdbqt -O receptor.pdbqt -xr

# Option B: meeko/ADFR suite
# prepare_receptor -r receptor.pdb -o receptor.pdbqt

Step 3: Binding Site Definition¶

The binding site provides the search space for docking: - center_x, center_y, center_z — center of the search box - box_size_x, box_size_y, box_size_z — dimensions of the search box (Angstroms)

These come from the BindingSite table (detected via Fpocket, defined from ligand, or manual).

Step 4: Ligand Preparation¶

# Input: SMILES string
# Output: PDBQT string

from rdkit import Chem
from rdkit.Chem import AllChem
from meeko import MoleculePreparation, PDBQTWriterLegacy

# 1. Parse SMILES
mol = Chem.MolFromSmiles(smiles)

# 2. Add hydrogens
mol = Chem.AddHs(mol)

# 3. Generate 3D coordinates (ETKDGv3)
params = AllChem.ETKDGv3()
params.randomSeed = 42
AllChem.EmbedMolecule(mol, params)

# 4. Energy minimize (MMFF94)
AllChem.MMFFOptimizeMolecule(mol, maxIters=500)

# 5. Convert to PDBQT using meeko
preparator = MoleculePreparation()
mol_setup = preparator.prepare(mol)
pdbqt_string, is_ok, error_msg = PDBQTWriterLegacy.write_string(mol_setup)

Step 5: Docking with AutoDock Vina¶

# Input: Receptor PDBQT, Ligand PDBQT, Binding Site parameters
# Output: Ranked poses with scores

from vina import Vina

v = Vina(sf_name='vina')

# Set receptor
v.set_receptor(rigid_pdbqt=receptor_pdbqt_path)

# Set ligand
v.set_ligand_from_string(ligand_pdbqt_string)

# Set search space
v.compute_vina_maps(
    center=[center_x, center_y, center_z],
    box_size=[box_size_x, box_size_y, box_size_z]
)

# Dock
v.dock(
    exhaustiveness=exhaustiveness,  # Default: 32
    n_poses=num_poses               # Default: 10
)

# Get results
energies = v.energies()  # Nx4 array: [score, lig_inter, lig_intra, torsions]
poses_pdbqt = v.poses(n_poses=num_poses)  # PDBQT string with all poses

Step 6: Post-Processing and Interaction Analysis¶

# Input: Best docking pose (PDBQT) + protein structure
# Output: Interaction profile

# 1. Convert pose PDBQT back to PDB
# (via Open Babel or manual parsing)

# 2. Create complex PDB (protein + ligand)
complex_pdb = receptor_pdb + "\n" + ligand_pose_pdb

# 3. Run PLIP
from plip.structure.preparation import PDBComplex

complex = PDBComplex()
complex.load_pdb(complex_pdb_path)
complex.analyze()

for bsite_key, bsite in complex.interaction_sets.items():
    interactions = {
        "hydrogen_bonds": [
            {
                "donor": str(hb.d),
                "acceptor": str(hb.a),
                "distance": hb.distance_ad,
                "angle": hb.angle,
                "protein_residue": f"{hb.restype}{hb.resnr}",
            }
            for hb in bsite.hbonds_pdon + bsite.hbonds_ldon
        ],
        "hydrophobic_contacts": [
            {
                "protein_residue": f"{hp.restype}{hp.resnr}",
                "distance": hp.distance,
            }
            for hp in bsite.hydrophobic_contacts
        ],
        "pi_stacking": [
            {
                "protein_residue": f"{ps.restype}{ps.resnr}",
                "type": ps.type,  # "P" (parallel) or "T" (T-shaped)
                "distance": ps.distance,
            }
            for ps in bsite.pistacking
        ],
        "salt_bridges": [
            {
                "protein_residue": f"{sb.restype}{sb.resnr}",
                "distance": sb.distance,
                "protein_is_positive": sb.protispos,
            }
            for sb in bsite.saltbridge_lneg + bsite.saltbridge_pneg
        ],
        "water_bridges": [
            {
                "protein_residue": f"{wb.restype}{wb.resnr}",
                "distance_protein_water": wb.distance_aw,
                "distance_ligand_water": wb.distance_dw,
            }
            for wb in bsite.water_bridges
        ],
    }

Service Layer¶

`chemlib/services/docking_service.py`¶

class DockingService:
    """
    Orchestrates the full docking workflow:
    protein preparation → ligand preparation → docking → results storage.
    """

    async def prepare_receptor(
        self, db: AsyncSession, structure_id: int,
        remove_water: bool = True,
        remove_heterogens: bool = True,
        add_hydrogens: bool = True,
        ph: float = 7.0
    ) -> dict:
        """
        Prepare a protein structure for docking.

        Steps:
        1. Get ProteinStructure from DB
        2. Fix with PDBFixer (missing atoms, hydrogens)
        3. Convert to PDBQT (Open Babel subprocess)
        4. Cache the prepared PDBQT (store in temp or in a new field)

        Returns: {"receptor_pdbqt": "...", "fixed_pdb": "...", "num_atoms": 5000}
        """

    async def prepare_ligand(
        self, smiles: str
    ) -> dict:
        """
        Prepare a small molecule for docking.

        Steps:
        1. Parse SMILES with RDKit
        2. Add hydrogens
        3. 3D embed (ETKDGv3)
        4. MMFF94 minimize
        5. Convert to PDBQT via meeko

        Returns: {"ligand_pdbqt": "...", "num_atoms": 25, "num_rotatable": 4}
        """

    async def dock(
        self, db: AsyncSession,
        structure_id: int,
        binding_site_id: int,
        ligand_smiles: list[str],
        compound_ids: list[int | None] | None = None,
        assembled_molecule_ids: list[int | None] | None = None,
        exhaustiveness: int = 32,
        num_poses: int = 10,
        energy_range: float = 3.0
    ) -> int:  # Returns DockingRun.id
        """
        Run docking for a batch of ligands against a protein binding site.

        Steps:
        1. Prepare receptor (or use cached PDBQT)
        2. Load binding site parameters (center, box_size)
        3. Create DockingRun record with status="running"
        4. For each ligand SMILES:
           a. Prepare ligand PDBQT (meeko)
           b. Run Vina docking
           c. Parse results (scores, poses)
           d. Run PLIP on best pose (optional, can be deferred)
           e. Create DockingResult record
        5. Update DockingRun status to "completed"
        6. Return DockingRun.id
        """

    async def dock_single(
        self, receptor_pdbqt: str,
        ligand_pdbqt: str,
        center: tuple[float, float, float],
        box_size: tuple[float, float, float],
        exhaustiveness: int = 32,
        num_poses: int = 10
    ) -> dict:
        """
        Dock a single ligand. Low-level method used by dock() and pipeline filters.

        Returns: {
            "scores": [-8.5, -7.2, -6.8, ...],  # kcal/mol per pose
            "poses_pdbqt": "MODEL 1\n...",         # All poses in PDBQT format
            "best_score": -8.5,
            "num_poses": 10
        }
        """

    async def get_run(self, db: AsyncSession, run_id: int) -> DockingRunResponse | None:
        """Get docking run status."""

    async def get_results(
        self, db: AsyncSession, run_id: int,
        sort_by: str = "score",
        sort_order: str = "asc",
        limit: int = 50,
        offset: int = 0
    ) -> list[DockingResultResponse]:
        """Get ranked docking results for a run."""

    async def get_result_detail(
        self, db: AsyncSession, result_id: int
    ) -> DockingResultDetailResponse:
        """Get a single docking result with full pose data."""

`chemlib/services/interaction_service.py`¶

class InteractionService:
    """Protein-ligand interaction analysis using PLIP."""

    async def analyze_interactions(
        self, protein_pdb: str, ligand_pdb: str
    ) -> dict:
        """
        Analyze interactions between a protein and a docked ligand.

        Steps:
        1. Combine protein and ligand PDB into a complex
        2. Write to temp file
        3. Run PLIP analysis
        4. Parse results into structured dict

        Returns: {
            "hydrogen_bonds": [
                {"donor": "LIG:O3", "acceptor": "A:ASP855:OD1", "distance": 2.8,
                 "angle": 165.0, "is_strong": True}
            ],
            "hydrophobic_contacts": [
                {"protein_atom": "A:LEU718:CD1", "ligand_atom": "LIG:C12",
                 "distance": 3.6}
            ],
            "pi_stacking": [...],
            "salt_bridges": [...],
            "water_bridges": [...],
            "summary": {
                "num_hbonds": 3,
                "num_hydrophobic": 8,
                "num_pi_stacking": 1,
                "num_salt_bridges": 0,
                "num_water_bridges": 1,
                "total_interactions": 13
            }
        }
        """

    async def analyze_docking_result(
        self, db: AsyncSession, result_id: int
    ) -> dict:
        """
        Analyze interactions for a stored docking result.
        Fetches the protein structure and docking pose, runs PLIP,
        and updates the DockingResult.interactions field.
        """

Docking Utility Layer¶

`chemlib/docking/ligand_prep.py`¶

"""Ligand preparation utilities for molecular docking."""

from rdkit import Chem
from rdkit.Chem import AllChem
from meeko import MoleculePreparation, PDBQTWriterLegacy


def smiles_to_pdbqt(smiles: str, seed: int = 42) -> str:
    """
    Convert SMILES to PDBQT string for Vina docking.

    Steps:
    1. Parse SMILES → Mol
    2. AddHs
    3. Embed 3D (ETKDGv3)
    4. MMFF94 minimize (500 iterations)
    5. meeko MoleculePreparation → PDBQT

    Returns: PDBQT string

    Raises:
    - ValueError if SMILES cannot be parsed
    - RuntimeError if 3D embedding fails
    - RuntimeError if meeko preparation fails
    """
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        raise ValueError(f"Cannot parse SMILES: {smiles}")

    mol = Chem.AddHs(mol)

    params = AllChem.ETKDGv3()
    params.randomSeed = seed
    status = AllChem.EmbedMolecule(mol, params)
    if status == -1:
        # Fallback: try without distance geometry
        AllChem.EmbedMolecule(mol, AllChem.ETKDG())

    AllChem.MMFFOptimizeMolecule(mol, maxIters=500)

    preparator = MoleculePreparation()
    mol_setup = preparator.prepare(mol)
    pdbqt_string, is_ok, error_msg = PDBQTWriterLegacy.write_string(mol_setup)

    if not is_ok:
        raise RuntimeError(f"meeko preparation failed: {error_msg}")

    return pdbqt_string


def mol_to_pdbqt(mol: Chem.Mol) -> str:
    """
    Convert RDKit Mol (with 3D coords) to PDBQT string.
    Assumes the mol already has hydrogens and 3D coordinates.
    """
    preparator = MoleculePreparation()
    mol_setup = preparator.prepare(mol)
    pdbqt_string, is_ok, error_msg = PDBQTWriterLegacy.write_string(mol_setup)
    if not is_ok:
        raise RuntimeError(f"meeko preparation failed: {error_msg}")
    return pdbqt_string


def pdbqt_to_pdb(pdbqt_string: str) -> str:
    """
    Convert PDBQT to PDB format (strip charge and type columns).
    Simple text manipulation — removes the last two columns of ATOM/HETATM lines.
    """
    lines = []
    for line in pdbqt_string.split('\n'):
        if line.startswith(('ATOM', 'HETATM')):
            lines.append(line[:66])  # PDB format is 66 chars for ATOM records
        elif line.startswith(('MODEL', 'ENDMDL', 'END', 'REMARK', 'TER')):
            lines.append(line)
    return '\n'.join(lines)


def split_pdbqt_poses(pdbqt_string: str) -> list[str]:
    """
    Split a multi-model PDBQT into individual pose strings.
    Returns: list of PDBQT strings, one per pose.
    """
    poses = []
    current_pose = []
    for line in pdbqt_string.split('\n'):
        if line.startswith('MODEL'):
            current_pose = [line]
        elif line.startswith('ENDMDL'):
            current_pose.append(line)
            poses.append('\n'.join(current_pose))
            current_pose = []
        elif current_pose:
            current_pose.append(line)
    return poses

`chemlib/docking/receptor_prep.py`¶

"""Receptor preparation utilities for molecular docking."""

import subprocess
import tempfile
from pathlib import Path


def pdb_to_pdbqt(pdb_data: str, remove_water: bool = True) -> str:
    """
    Convert PDB format to PDBQT format for Vina receptor.

    Uses Open Babel subprocess:
    obabel -ipdb input.pdb -opdbqt -O output.pdbqt -xr

    The -xr flag tells Open Babel this is a rigid receptor (no flexible residues).

    Returns: PDBQT string
    """
    with tempfile.NamedTemporaryFile(suffix=".pdb", mode='w', delete=False) as f_in:
        f_in.write(pdb_data)
        input_path = Path(f_in.name)

    output_path = input_path.with_suffix('.pdbqt')

    try:
        cmd = [
            "obabel",
            "-ipdb", str(input_path),
            "-opdbqt",
            "-O", str(output_path),
            "-xr"  # rigid receptor
        ]
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)

        if result.returncode != 0:
            raise RuntimeError(f"Open Babel failed: {result.stderr}")

        return output_path.read_text()
    finally:
        input_path.unlink(missing_ok=True)
        output_path.unlink(missing_ok=True)


def prepare_receptor_full(
    pdb_data: str,
    remove_water: bool = True,
    remove_heterogens: bool = True,
    add_hydrogens: bool = True,
    ph: float = 7.0
) -> tuple[str, str]:
    """
    Full receptor preparation pipeline:
    1. PDBFixer: fix missing atoms, remove water/heterogens, add Hs
    2. Convert to PDBQT

    Returns: (fixed_pdb, receptor_pdbqt)
    """
    from chemlib.bioinformatics.protein_prep import fix_protein

    fixed_pdb = fix_protein(pdb_data)
    receptor_pdbqt = pdb_to_pdbqt(fixed_pdb, remove_water=remove_water)
    return fixed_pdb, receptor_pdbqt

`chemlib/docking/vina_runner.py`¶

"""AutoDock Vina docking engine wrapper."""

import tempfile
from pathlib import Path
from typing import Optional

from vina import Vina


def dock_ligand(
    receptor_pdbqt: str,
    ligand_pdbqt: str,
    center: tuple[float, float, float],
    box_size: tuple[float, float, float],
    exhaustiveness: int = 32,
    num_poses: int = 10,
    energy_range: float = 3.0,
    seed: int = 42
) -> dict:
    """
    Dock a single ligand against a receptor using AutoDock Vina.

    Args:
        receptor_pdbqt: Receptor in PDBQT format
        ligand_pdbqt: Ligand in PDBQT format
        center: (x, y, z) center of search box
        box_size: (sx, sy, sz) dimensions of search box in Angstroms
        exhaustiveness: Search thoroughness (8-128, higher = more exhaustive)
        num_poses: Maximum number of poses to return
        energy_range: Maximum energy difference from best pose (kcal/mol)
        seed: Random seed for reproducibility

    Returns: {
        "scores": [-8.5, -7.2, -6.8, ...],     # kcal/mol, sorted best to worst
        "poses_pdbqt": "MODEL 1\\n...",          # All poses in multi-model PDBQT
        "best_score": -8.5,
        "energies": [                             # Detailed energy breakdown
            {"total": -8.5, "inter": -9.1, "intra": 0.6, "torsion": 0.0},
            ...
        ],
        "num_poses_found": 10
    }
    """
    # Write receptor to temp file (Vina requires file path)
    with tempfile.NamedTemporaryFile(suffix=".pdbqt", mode='w', delete=False) as f:
        f.write(receptor_pdbqt)
        receptor_path = f.name

    try:
        v = Vina(sf_name='vina', seed=seed)
        v.set_receptor(rigid_pdbqt=receptor_path)
        v.set_ligand_from_string(ligand_pdbqt)
        v.compute_vina_maps(
            center=list(center),
            box_size=list(box_size)
        )
        v.dock(exhaustiveness=exhaustiveness, n_poses=num_poses)

        energies_array = v.energies()  # Nx4 numpy array
        poses_pdbqt = v.poses(n_poses=num_poses)

        scores = energies_array[:, 0].tolist()
        energies = [
            {
                "total": float(row[0]),
                "inter": float(row[1]),
                "intra": float(row[2]),
                "torsion": float(row[3]) if len(row) > 3 else 0.0,
            }
            for row in energies_array
        ]

        return {
            "scores": scores,
            "poses_pdbqt": poses_pdbqt,
            "best_score": scores[0] if scores else None,
            "energies": energies,
            "num_poses_found": len(scores),
        }
    finally:
        Path(receptor_path).unlink(missing_ok=True)


def score_ligand(
    receptor_pdbqt: str,
    ligand_pdbqt: str,
    center: tuple[float, float, float],
    box_size: tuple[float, float, float]
) -> float:
    """
    Score a ligand pose without docking (single-point scoring).
    Useful for rescoring or evaluating a specific pose.

    Returns: Vina score in kcal/mol
    """
    with tempfile.NamedTemporaryFile(suffix=".pdbqt", mode='w', delete=False) as f:
        f.write(receptor_pdbqt)
        receptor_path = f.name

    try:
        v = Vina(sf_name='vina')
        v.set_receptor(rigid_pdbqt=receptor_path)
        v.set_ligand_from_string(ligand_pdbqt)
        v.compute_vina_maps(center=list(center), box_size=list(box_size))
        energy = v.score()
        return float(energy[0])
    finally:
        Path(receptor_path).unlink(missing_ok=True)

`chemlib/docking/interaction_analysis.py`¶

"""Protein-ligand interaction analysis using PLIP."""

import tempfile
from pathlib import Path
from plip.structure.preparation import PDBComplex


def analyze_complex(complex_pdb: str) -> dict:
    """
    Analyze protein-ligand interactions using PLIP.

    Args:
        complex_pdb: PDB format string containing both protein and ligand.
                     The ligand should be a HETATM residue.

    Returns: {
        "hydrogen_bonds": [{
            "donor_type": "protein" | "ligand",
            "donor": "A:ASP855:OD1",
            "acceptor": "LIG:N1",
            "distance": 2.82,
            "angle": 162.5,
        }, ...],
        "hydrophobic_contacts": [{
            "protein_residue": "A:LEU718",
            "protein_atom": "CD1",
            "ligand_atom": "C12",
            "distance": 3.58,
        }, ...],
        "pi_stacking": [{
            "protein_residue": "A:PHE723",
            "type": "parallel",  # or "t-shaped"
            "distance": 4.12,
            "angle": 15.3,
            "offset": 1.2,
        }, ...],
        "salt_bridges": [{
            "protein_residue": "A:LYS745",
            "distance": 3.94,
            "protein_is_positive": True,
        }, ...],
        "water_bridges": [{
            "protein_residue": "A:THR790",
            "water_id": "HOH:401",
            "distance_protein_water": 2.75,
            "distance_ligand_water": 2.88,
        }, ...],
        "halogen_bonds": [{
            "protein_residue": "A:MET793",
            "distance": 3.21,
            "halogen": "Cl",
        }, ...],
        "metal_complexes": [...],
        "summary": {
            "num_hbonds": 3,
            "num_hydrophobic": 8,
            "num_pi_stacking": 1,
            "num_salt_bridges": 0,
            "num_water_bridges": 1,
            "num_halogen_bonds": 0,
            "total_interactions": 13,
        }
    }
    """
    with tempfile.NamedTemporaryFile(suffix=".pdb", mode='w', delete=False) as f:
        f.write(complex_pdb)
        complex_path = f.name

    try:
        mol = PDBComplex()
        mol.load_pdb(complex_path)
        mol.analyze()

        results = {
            "hydrogen_bonds": [],
            "hydrophobic_contacts": [],
            "pi_stacking": [],
            "salt_bridges": [],
            "water_bridges": [],
            "halogen_bonds": [],
            "metal_complexes": [],
            "summary": {},
        }

        for bsite_key, bsite in mol.interaction_sets.items():
            # Parse H-bonds (protein-donor and ligand-donor)
            for hb in bsite.hbonds_pdon:
                results["hydrogen_bonds"].append({
                    "donor_type": "protein",
                    "donor": f"{hb.reschain}:{hb.restype}{hb.resnr}:{hb.d.type}",
                    "acceptor": f"LIG:{hb.a.type}",
                    "distance": round(hb.distance_ad, 2),
                    "angle": round(hb.angle, 1),
                })
            for hb in bsite.hbonds_ldon:
                results["hydrogen_bonds"].append({
                    "donor_type": "ligand",
                    "donor": f"LIG:{hb.d.type}",
                    "acceptor": f"{hb.reschain}:{hb.restype}{hb.resnr}:{hb.a.type}",
                    "distance": round(hb.distance_ad, 2),
                    "angle": round(hb.angle, 1),
                })

            # Parse hydrophobic contacts
            for hp in bsite.hydrophobic_contacts:
                results["hydrophobic_contacts"].append({
                    "protein_residue": f"{hp.reschain}:{hp.restype}{hp.resnr}",
                    "distance": round(hp.distance, 2),
                })

            # Parse pi-stacking
            for ps in bsite.pistacking:
                results["pi_stacking"].append({
                    "protein_residue": f"{ps.reschain}:{ps.restype}{ps.resnr}",
                    "type": "parallel" if ps.type == "P" else "t-shaped",
                    "distance": round(ps.distance, 2),
                    "angle": round(ps.angle, 1),
                    "offset": round(ps.offset, 2),
                })

            # Parse salt bridges
            for sb in bsite.saltbridge_lneg + bsite.saltbridge_pneg:
                results["salt_bridges"].append({
                    "protein_residue": f"{sb.reschain}:{sb.restype}{sb.resnr}",
                    "distance": round(sb.distance, 2),
                    "protein_is_positive": sb.protispos,
                })

            # Parse water bridges
            for wb in bsite.water_bridges:
                results["water_bridges"].append({
                    "protein_residue": f"{wb.reschain}:{wb.restype}{wb.resnr}",
                    "distance_protein_water": round(wb.distance_aw, 2),
                    "distance_ligand_water": round(wb.distance_dw, 2),
                })

        # Summary
        results["summary"] = {
            "num_hbonds": len(results["hydrogen_bonds"]),
            "num_hydrophobic": len(results["hydrophobic_contacts"]),
            "num_pi_stacking": len(results["pi_stacking"]),
            "num_salt_bridges": len(results["salt_bridges"]),
            "num_water_bridges": len(results["water_bridges"]),
            "num_halogen_bonds": len(results["halogen_bonds"]),
            "total_interactions": sum(
                len(results[k]) for k in [
                    "hydrogen_bonds", "hydrophobic_contacts", "pi_stacking",
                    "salt_bridges", "water_bridges", "halogen_bonds"
                ]
            ),
        }

        return results
    finally:
        Path(complex_path).unlink(missing_ok=True)

Database Schema¶

`chemlib/models/docking.py`¶

from __future__ import annotations
from datetime import datetime
from typing import Optional

from sqlalchemy import String, Text, Integer, Float, DateTime, ForeignKey, JSON
from sqlalchemy.orm import Mapped, mapped_column, relationship
from sqlalchemy.sql import func

from chemlib.models.base import Base


class DockingRun(Base):
    __tablename__ = "docking_runs"

    id: Mapped[int] = mapped_column(primary_key=True)
    protein_structure_id: Mapped[int] = mapped_column(
        ForeignKey("protein_structures.id", ondelete="CASCADE"), nullable=False, index=True
    )
    binding_site_id: Mapped[int] = mapped_column(
        ForeignKey("binding_sites.id", ondelete="CASCADE"), nullable=False, index=True
    )
    status: Mapped[str] = mapped_column(
        String(20), nullable=False, default="pending"
        # "pending", "preparing", "running", "completed", "failed"
    )
    num_ligands: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
    completed_ligands: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
    exhaustiveness: Mapped[int] = mapped_column(Integer, nullable=False, default=32)
    num_poses: Mapped[int] = mapped_column(Integer, nullable=False, default=10)
    energy_range: Mapped[float] = mapped_column(Float, nullable=False, default=3.0)
    receptor_pdbqt: Mapped[Optional[str]] = mapped_column(Text, nullable=True)  # Cached
    started_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True)
    completed_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True)
    error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now()
    )

    # Relationships
    results: Mapped[list[DockingResult]] = relationship(
        back_populates="run", cascade="all, delete-orphan"
    )

    def __repr__(self) -> str:
        return f"<DockingRun(id={self.id}, status='{self.status}', num_ligands={self.num_ligands})>"


class DockingResult(Base):
    __tablename__ = "docking_results"

    id: Mapped[int] = mapped_column(primary_key=True)
    run_id: Mapped[int] = mapped_column(
        ForeignKey("docking_runs.id", ondelete="CASCADE"), nullable=False, index=True
    )
    compound_id: Mapped[Optional[int]] = mapped_column(
        ForeignKey("compounds.id", ondelete="SET NULL"), nullable=True, index=True
    )
    assembled_molecule_id: Mapped[Optional[int]] = mapped_column(
        ForeignKey("assembled_molecules.id", ondelete="SET NULL"), nullable=True, index=True
    )
    smiles: Mapped[str] = mapped_column(String(2000), nullable=False)
    score: Mapped[float] = mapped_column(Float, nullable=False)  # kcal/mol (best pose)
    pose_rank: Mapped[int] = mapped_column(Integer, nullable=False, default=1)
    pose_pdbqt: Mapped[Optional[str]] = mapped_column(Text, nullable=True)  # Best pose PDBQT
    all_poses_pdbqt: Mapped[Optional[str]] = mapped_column(Text, nullable=True)  # All poses
    energies: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)  # Energy breakdown
    interactions: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)  # PLIP results
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now()
    )

    # Relationships
    run: Mapped[DockingRun] = relationship(back_populates="results")

    def __repr__(self) -> str:
        return f"<DockingResult(id={self.id}, smiles='{self.smiles[:30]}...', score={self.score})>"

Pydantic Schemas¶

`chemlib/schemas/docking.py`¶

from __future__ import annotations
from datetime import datetime
from typing import Optional, Any
from pydantic import BaseModel, ConfigDict, Field


class DockingRunCreate(BaseModel):
    """Request to start a docking run."""
    structure_id: int
    binding_site_id: int
    compound_ids: list[int] | None = None
    assembled_molecule_ids: list[int] | None = None
    smiles_list: list[str] | None = None           # Direct SMILES input
    exhaustiveness: int = Field(default=32, ge=8, le=128)
    num_poses: int = Field(default=10, ge=1, le=50)
    energy_range: float = Field(default=3.0, ge=1.0, le=10.0)
    analyze_interactions: bool = True               # Run PLIP on best pose


class DockingRunResponse(BaseModel):
    id: int
    protein_structure_id: int
    binding_site_id: int
    status: str
    num_ligands: int
    completed_ligands: int
    exhaustiveness: int
    num_poses: int
    started_at: datetime | None
    completed_at: datetime | None
    error_message: str | None

    model_config = ConfigDict(from_attributes=True)


class DockingResultResponse(BaseModel):
    id: int
    run_id: int
    compound_id: int | None
    assembled_molecule_id: int | None
    smiles: str
    score: float
    pose_rank: int
    interactions_summary: dict | None = None  # Just the summary counts
    created_at: datetime

    model_config = ConfigDict(from_attributes=True)


class DockingResultDetailResponse(DockingResultResponse):
    """Full detail including pose data and interactions."""
    pose_pdbqt: str | None
    all_poses_pdbqt: str | None
    energies: list[dict] | None
    interactions: dict | None


class InteractionAnalysisResponse(BaseModel):
    hydrogen_bonds: list[dict]
    hydrophobic_contacts: list[dict]
    pi_stacking: list[dict]
    salt_bridges: list[dict]
    water_bridges: list[dict]
    halogen_bonds: list[dict]
    summary: dict

API Endpoints¶

Docking — `chemlib/api/docking.py`¶

Method	Endpoint	Description	Request Body	Response
`POST`	`/api/docking/prepare-receptor/{structure_id}`	Prepare protein for docking	Query: remove_water, add_hydrogens	{"status": "prepared", "num_atoms": 5000}
`POST`	`/api/docking/run`	Submit a docking job	DockingRunCreate	DockingRunResponse
`GET`	`/api/docking/runs/{run_id}`	Get docking run status	—	DockingRunResponse
`GET`	`/api/docking/runs/{run_id}/results`	Get ranked results	Query: sort_by, limit, offset	list[DockingResultResponse]
`GET`	`/api/docking/results/{result_id}`	Get single result detail	—	DockingResultDetailResponse
`GET`	`/api/docking/results/{result_id}/interactions`	Get interaction analysis	—	InteractionAnalysisResponse
`GET`	`/api/docking/results/{result_id}/pose`	Get pose PDB for viewer	Query: pose_rank (default 1)	{"pdb_data": "...", "score": -8.5}
`POST`	`/api/docking/results/{result_id}/analyze`	Re-run PLIP analysis	—	InteractionAnalysisResponse

Example API Flow¶

# 1. Prepare receptor
POST /api/docking/prepare-receptor/42
→ {"status": "prepared", "num_atoms": 5000}

# 2. Submit docking job
POST /api/docking/run
{
    "structure_id": 42,
    "binding_site_id": 7,
    "compound_ids": [1, 2, 3, 4, 5],
    "exhaustiveness": 32,
    "num_poses": 10,
    "analyze_interactions": true
}
→ {"id": 1, "status": "running", "num_ligands": 5, ...}

# 3. Poll for completion
GET /api/docking/runs/1
→ {"id": 1, "status": "running", "completed_ligands": 3, "num_ligands": 5, ...}

# 4. Get results
GET /api/docking/runs/1/results?sort_by=score&sort_order=asc
→ [
    {"id": 1, "smiles": "CCO", "score": -8.5, "compound_id": 3, ...},
    {"id": 2, "smiles": "CC(=O)O", "score": -7.2, "compound_id": 1, ...},
    ...
  ]

# 5. View pose in 3D
GET /api/docking/results/1/pose
→ {"pdb_data": "ATOM  1  N  ...", "score": -8.5}

# 6. View interactions
GET /api/docking/results/1/interactions
→ {"hydrogen_bonds": [...], "hydrophobic_contacts": [...], "summary": {"num_hbonds": 3, ...}}

Visualization¶

Docking Results Viewer (`docking_viewer.html`)¶

┌─────────────────────────────────────────────────────────────────────────┐
│  Docking Results: Run #1 — EGFR (1M17) vs Pocket 1     Status: Done   │
│  5 ligands docked | Exhaustiveness: 32 | Poses: 10                     │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │                                                                  │  │
│  │              3D VIEWER (3Dmol.js)                                │  │
│  │                                                                  │  │
│  │    [Protein in cartoon (gray) +                                 │  │
│  │     Docked ligand in sticks (green) +                           │  │
│  │     Binding site surface (transparent blue) +                   │  │
│  │     H-bond dashes (yellow) +                                    │  │
│  │     Hydrophobic contacts (orange dots)]                         │  │
│  │                                                                  │  │
│  │  Pose: [1 ▾] of 10  Score: -8.5 kcal/mol                      │  │
│  │  Style: [Cartoon ▾]  Surface: [☑ Binding Site]  Labels: [☑]    │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │  INTERACTION DIAGRAM (2D)                                        │  │
│  │                                                                  │  │
│  │     ASP855 ──── H ────── O (ligand)      2.8 A                 │  │
│  │     MET793 ──── H ────── N (ligand)      3.1 A                 │  │
│  │     LEU718 ···· hydrophobic ···· C12      3.6 A                 │  │
│  │     PHE723 ═══ pi-stack ═══ ring          4.1 A                 │  │
│  │                                                                  │  │
│  │  Summary: 3 H-bonds | 8 hydrophobic | 1 pi-stack | 0 salt      │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  RESULTS TABLE                                                          │
│  ┌────┬───────────────────┬──────────┬──────────┬──────────┬─────────┐ │
│  │ #  │ Compound          │ Score    │ H-bonds  │ Contacts │ Select  │ │
│  ├────┼───────────────────┼──────────┼──────────┼──────────┼─────────┤ │
│  │ 1  │ CPD-003 (CCO...)  │ -8.5     │ 3        │ 12       │ [View]  │ │
│  │ 2  │ CPD-001 (CC...)   │ -7.2     │ 2        │ 9        │ [View]  │ │
│  │ 3  │ CPD-005 (c1c...)  │ -6.8     │ 2        │ 7        │ [View]  │ │
│  │ 4  │ CPD-002 (OC...)   │ -5.4     │ 1        │ 5        │ [View]  │ │
│  │ 5  │ CPD-004 (N#C...)  │ -4.1     │ 0        │ 3        │ [View]  │ │
│  └────┴───────────────────┴──────────┴──────────┴──────────┴─────────┘ │
└─────────────────────────────────────────────────────────────────────────┘

3Dmol.js Docking Viewer Implementation¶

// chemlib/static/js/protein_viewer.js — docking mode additions

class ProteinViewer {
    // ... (base class from Protein Target Library doc)

    async showDockingResult(resultId) {
        this.clear();

        // Fetch protein structure
        const run = await this.fetchRunInfo(resultId);
        const structure = await fetch(`/api/structures/${run.protein_structure_id}`).then(r => r.json());

        // Load protein (cartoon, gray)
        this.viewer.addModel(structure.structure_data, "pdb");
        this.viewer.setStyle({model: 0}, {
            cartoon: {color: '#cccccc', opacity: 0.8}
        });

        // Fetch docking pose
        const pose = await fetch(`/api/docking/results/${resultId}/pose`).then(r => r.json());

        // Load ligand pose (sticks, green)
        this.viewer.addModel(pose.pdb_data, "pdb");
        this.viewer.setStyle({model: 1}, {
            stick: {colorscheme: 'greenCarbon', radius: 0.15},
            sphere: {scale: 0.25, colorscheme: 'greenCarbon'}
        });

        // Show binding site surface
        const site = await fetch(`/api/structures/${run.protein_structure_id}/binding-sites`).then(r => r.json());
        if (site.length > 0) {
            const bs = site.find(s => s.id === run.binding_site_id);
            if (bs) {
                // Highlight binding site residues
                const residueSelector = bs.residues.map(r => {
                    const [chain, res] = r.split(':');
                    return {chain: chain, resi: parseInt(res.replace(/\D/g, ''))};
                });
                this.viewer.addSurface(
                    this.viewer.SurfaceType.VDW,
                    {opacity: 0.15, color: '#3498db'},
                    {or: residueSelector},
                    {model: 0}
                );
            }
        }

        // Show interactions
        const interactions = await fetch(`/api/docking/results/${resultId}/interactions`).then(r => r.json());
        this.showInteractionLines(interactions);

        this.viewer.zoomTo({model: 1});
        this.viewer.render();
    }

    showInteractionLines(interactions) {
        // H-bonds as yellow dashed lines
        for (const hb of interactions.hydrogen_bonds) {
            // Would need atom coordinates — simplification here
            // In practice, extract coords from PDB data
        }
        // Hydrophobic contacts as orange dotted lines
        // Pi-stacking as green parallel lines
    }

    switchPose(resultId, poseRank) {
        // Fetch specific pose: GET /api/docking/results/{id}/pose?pose_rank={rank}
        // Replace model 1 with new pose
    }
}

Score Heatmap (Multiple Ligands)¶

When viewing results for a docking run with many ligands, show a heatmap:

// Using simple HTML table with color-coded cells
function renderScoreHeatmap(results) {
    // results: [{smiles, score, num_hbonds, num_contacts}, ...]
    // Color scale: dark green (best score) → yellow → red (worst score)
    // Score range: typically -12 to 0 kcal/mol
}

Alembic Migration¶

# alembic/versions/xxxx_add_docking_tables.py

def upgrade():
    op.create_table(
        "docking_runs",
        sa.Column("id", sa.Integer, primary_key=True),
        sa.Column("protein_structure_id", sa.Integer, sa.ForeignKey("protein_structures.id", ondelete="CASCADE"), nullable=False, index=True),
        sa.Column("binding_site_id", sa.Integer, sa.ForeignKey("binding_sites.id", ondelete="CASCADE"), nullable=False, index=True),
        sa.Column("status", sa.String(20), nullable=False, default="pending"),
        sa.Column("num_ligands", sa.Integer, nullable=False, default=0),
        sa.Column("completed_ligands", sa.Integer, nullable=False, default=0),
        sa.Column("exhaustiveness", sa.Integer, nullable=False, default=32),
        sa.Column("num_poses", sa.Integer, nullable=False, default=10),
        sa.Column("energy_range", sa.Float, nullable=False, default=3.0),
        sa.Column("receptor_pdbqt", sa.Text, nullable=True),
        sa.Column("started_at", sa.DateTime(timezone=True), nullable=True),
        sa.Column("completed_at", sa.DateTime(timezone=True), nullable=True),
        sa.Column("error_message", sa.Text, nullable=True),
        sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
    )

    op.create_table(
        "docking_results",
        sa.Column("id", sa.Integer, primary_key=True),
        sa.Column("run_id", sa.Integer, sa.ForeignKey("docking_runs.id", ondelete="CASCADE"), nullable=False, index=True),
        sa.Column("compound_id", sa.Integer, sa.ForeignKey("compounds.id", ondelete="SET NULL"), nullable=True, index=True),
        sa.Column("assembled_molecule_id", sa.Integer, sa.ForeignKey("assembled_molecules.id", ondelete="SET NULL"), nullable=True, index=True),
        sa.Column("smiles", sa.String(2000), nullable=False),
        sa.Column("score", sa.Float, nullable=False),
        sa.Column("pose_rank", sa.Integer, nullable=False, default=1),
        sa.Column("pose_pdbqt", sa.Text, nullable=True),
        sa.Column("all_poses_pdbqt", sa.Text, nullable=True),
        sa.Column("energies", sa.JSON, nullable=True),
        sa.Column("interactions", sa.JSON, nullable=True),
        sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
    )


def downgrade():
    op.drop_table("docking_results")
    op.drop_table("docking_runs")

Testing Strategy¶

Component	Test Type	Setup
`ligand_prep.py`	Unit	Known SMILES, verify PDBQT output format
`receptor_prep.py`	Unit	Small test PDB, requires Open Babel binary
`vina_runner.py`	Integration	Requires vina package, small receptor + ligand
`interaction_analysis.py`	Integration	Known complex PDB, verify interaction counts
`DockingService`	Integration	Mocked Vina (fast), real DB
API endpoints	E2E	Full stack, mocked docking (fast)

Test Fixtures¶

# Small test protein (just a few residues) for fast tests
TEST_RECEPTOR_PDB = """HEADER    TEST
ATOM      1  N   ALA A   1       1.000   2.000   3.000  1.00  0.00           N
ATOM      2  CA  ALA A   1       2.000   3.000   4.000  1.00  0.00           C
...
END
"""

# Known docking result for regression testing
TEST_LIGAND_SMILES = "c1ccc(cc1)C(=O)O"  # Benzoic acid
EXPECTED_SCORE_RANGE = (-8.0, -3.0)  # kcal/mol

Mocking Vina for Fast Tests¶

class MockVinaRunner:
    """Mock Vina for unit tests that don't need real docking."""

    def dock_ligand(self, receptor_pdbqt, ligand_pdbqt, center, box_size, **kwargs):
        return {
            "scores": [-7.5, -6.2, -5.8],
            "poses_pdbqt": "MODEL 1\nATOM...\nENDMDL\n",
            "best_score": -7.5,
            "energies": [{"total": -7.5, "inter": -8.0, "intra": 0.5, "torsion": 0.0}],
            "num_poses_found": 3,
        }

Performance Considerations¶

Operation	Time per Compound	Notes
Ligand preparation (meeko)	50-200 ms	3D embedding is the bottleneck
Receptor preparation	5-30 s (one-time)	Cache the PDBQT
Vina docking (exhaust=8)	5-30 s	Fast mode for screening
Vina docking (exhaust=32)	30 s - 5 min	Default, good balance
Vina docking (exhaust=128)	5-30 min	High accuracy, final validation
PLIP analysis	1-5 s	Per complex

Optimization strategies: - Cache prepared receptor PDBQT in DockingRun.receptor_pdbqt - Use lower exhaustiveness (8-16) for initial screening, higher (32-128) for final candidates - Parallelize ligand preparation (CPU-bound but fast) - Batch docking can run multiple Vina instances concurrently (IO-bound per instance) - Defer PLIP analysis to on-demand (only when user views a result)