Skip to content

Platform Overview — ChemLib Drug Discovery Platform

Vision

ChemLib began as a fragment-based drug design tool for curating chemical libraries, assembling molecules from fragments, and evaluating drug-likeness. The next evolution transforms ChemLib into a full Computer-Aided Drug Discovery (CADD) platform that covers the entire early-stage drug discovery workflow:

Target Identification → Virtual Screening → Lead Optimization

The platform retains ChemLib's existing strengths — compound management, BRICS assembly, scoring — and extends them with protein target management, structural biology tools, molecular docking, configurable screening pipelines, and an extensible plugin architecture. All modules share a single database, a unified API layer, and a cohesive web UI.

Click diagram to zoom and pan:

Platform Architecture Diagram


Platform Architecture (High-Level)

┌─────────────────────────────────────────────────────────────────────────────┐
│                              WEB UI (Browser)                                │
│  HTML/JS + 3Dmol.js + BioJS MSA Viewer + DAG Pipeline Editor                │
└──────────────────────────────────┬──────────────────────────────────────────┘
                                   │ HTTP (REST JSON)
┌─────────────────────────────────────────────────────────────────────────────┐
│                         API LAYER (FastAPI)                                   │
│  /api/compounds  /api/fragments  /api/assembly  /api/scoring                 │
│  /api/targets    /api/structures /api/alignments                             │
│  /api/docking    /api/pipelines  /api/plugins                                │
└──────────────────────────────────┬──────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                         SERVICE LAYER                                        │
│                                                                              │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐       │
│  │  Chemical     │ │  Protein     │ │  Structural  │ │  Virtual     │       │
│  │  Library      │ │  Target      │ │  Biology     │ │  Screening   │       │
│  │  Services     │ │  Services    │ │  Services    │ │  Engine      │       │
│  └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘       │
│  ┌──────────────┐ ┌──────────────┐                                          │
│  │  Docking &   │ │  Plugin      │                                          │
│  │  Scoring     │ │  Marketplace │                                          │
│  │  Services    │ │  Services    │                                          │
│  └──────────────┘ └──────────────┘                                          │
└───────────┬──────────────────┬──────────────────────────────────────────────┘
            │                  │
            ▼                  ▼
┌─────────────────────┐  ┌─────────────────────────────────────────────────┐
│  Chemistry Layer     │  │  DB Service Layer (CRUD)                        │
│  RDKit, Biopython,   │  │  SQLAlchemy 2.0 async                          │
│  Vina, Fpocket,      │  │                                                │
│  PLIP, meeko         │  │  CompoundDBService, FragmentDBService,          │
│                      │  │  ProteinTargetDBService, StructureDBService,    │
│                      │  │  PipelineDBService, DockingDBService,           │
│                      │  │  PluginDBService                                │
└─────────────────────┘  └──────────────────────┬──────────────────────────┘
                          ┌─────────────────────────────────────────────────┐
                          │          ORM MODELS (SQLAlchemy 2.0)            │
                          │  Compound, Fragment, AssembledMolecule,         │
                          │  ProteinTarget, ProteinStructure, BindingSite,  │
                          │  SequenceAlignment, StructuralAlignment,        │
                          │  Pipeline, PipelineRun, PipelineRunResult,      │
                          │  DockingRun, DockingResult,                     │
                          │  FilterPluginRegistry                           │
                          └──────────────────────┬─────────────────────────┘
                          ┌─────────────────────────────────────────────────┐
                          │  DATABASE — PostgreSQL (prod) / SQLite (dev)    │
                          └─────────────────────────────────────────────────┘

The Six Modules

1. Chemical Library (Existing — Phases 1-6)

The foundation. Manages compounds, fragments, BRICS decomposition, fragment assembly, conformer generation, energy minimization, drug-likeness scoring, and 3D visualization. Fully operational.

Key components: CompoundService, FragmentService, AssemblyService, ConformerService, ScoringService, VizService.

2. Protein Target Library (Phase 7)

Store, browse, and manage protein targets. Import from UniProt (sequence + metadata), RCSB PDB (structures), and AlphaFold DB (predicted structures). Each target can have multiple associated structures. Structures store the full PDB/mmCIF content for offline use.

Key components: ProteinTargetService, ProteinStructureService.

3. Structural Biology Tools (Phase 8)

Sequence alignment (pairwise via Biopython, multiple via MAFFT/Clustal Omega), structural alignment (TM-align via tmtools, Superimposer via Biopython), and homology search (BLAST, Foldseek). Visualize alignments interactively in the browser.

Key components: AlignmentService, HomologyService.

4. Virtual Screening Engine (Phase 11)

A configurable, DAG-based filtering pipeline that progressively narrows a compound library against a protein target. Users visually build pipelines by dragging filter nodes and connecting them. The executor processes compounds in batches, tracking pass/fail at each stage.

Key components: PipelineService, PipelineExecutor, FilterPlugin protocol.

5. Docking & Scoring (Phases 9-10)

Binding site detection (Fpocket), protein preparation (PDBFixer), ligand preparation (meeko), molecular docking (AutoDock Vina), interaction analysis (PLIP), and results visualization. Integrates into the screening pipeline as a docking filter node.

Key components: BindingSiteService, DockingService, InteractionService, ProteinPrepService.

6. Plugin Marketplace (Phase 12)

An extensible architecture for third-party tools. Plugins implement Protocol classes (FilterPlugin, DockingPlugin, ADMEPlugin, etc.), register in the database, and become available in the pipeline builder. Discovery via Python entry points.

Key components: PluginRegistryService, Protocol classes.


Data Flow — End-to-End Drug Discovery Workflow

Click diagram to zoom and pan:

Data Flow Workflow Diagram

This shows how a user would use the platform to go from a protein target to docked lead compounds:

Step 1: SELECT TARGET
  User imports a protein target from UniProt (e.g., EGFR, P00533)
  → ProteinTargetService.import_from_uniprot("P00533")
  → Stores: ProteinTarget record with sequence, metadata
Step 2: FETCH STRUCTURE
  User fetches a crystal structure from RCSB PDB (e.g., 1M17)
  → ProteinStructureService.fetch_from_rcsb("1M17")
  → Stores: ProteinStructure record with full PDB data
Step 3: DETECT BINDING SITE
  User runs Fpocket on the structure to find druggable pockets
  → BindingSiteService.detect_pockets(structure_id)
  → Stores: BindingSite records with center, box_size, druggability_score
  User selects the most druggable pocket or defines one manually
Step 4: PREPARE COMPOUND LIBRARY
  User has already imported/assembled compounds (Chemical Library module)
  OR: User assembles new candidates from the fragment library
  → CompoundService / AssemblyService
  → Available: hundreds to thousands of candidate compounds with SMILES
Step 5: BUILD SCREENING PIPELINE
  User opens the Pipeline Builder and creates a DAG:
    [Library Input] → [Lipinski Filter] → [PAINS Filter] → [QED > 0.4]
        → [Tanimoto > 0.3 vs known EGFR inhibitor] → [Docking vs pocket]
        → [Interaction Filter (>= 2 H-bonds)] → [Results]
  → PipelineService.create_pipeline(definition)
Step 6: RUN PIPELINE
  User clicks "Run Pipeline" — selects the compound library as input
  → PipelineExecutor runs compounds through the DAG in batches
  → Each node filters: Lipinski (1000→800) → PAINS (800→750) → QED (750→400)
        → Similarity (400→150) → Docking (150→150, scored) → Interactions (150→50)
  → PipelineRun tracks progress, PipelineRunResult stores per-compound results
Step 7: ANALYZE RESULTS
  User views the funnel: 1000 → 800 → 750 → 400 → 150 → 50
  User sorts the 50 surviving compounds by docking score
  User clicks a compound to see:
    - 3D pose overlaid on protein (3Dmol.js)
    - Interaction diagram (H-bonds, hydrophobic contacts)
    - Drug-likeness dashboard
    - Assembly history (if assembled from fragments)
Step 8: REFINE
  User goes back to the fragment library, modifies promising leads
  → Swap a fragment, re-assemble, re-dock
  → Iterate until satisfied

Technology Stack

Existing (Chemical Library)

Component Technology
Language Python 3.11+
Database PostgreSQL (prod) / SQLite (dev)
ORM SQLAlchemy 2.0+ (async)
Migrations Alembic
API FastAPI + Pydantic v2
Chemistry RDKit
3D Visualization 3Dmol.js / py3Dmol
Frontend HTML/JS + Jinja2 templates
Testing pytest + httpx + pytest-asyncio

New Additions

Component Technology Purpose
Protein parsing Biopython (Bio.PDB, Bio.SeqIO, Bio.Align) Parse PDB/mmCIF, sequence I/O, pairwise alignment
Structural alignment tmtools TM-align algorithm, TM-score computation
MSA (subprocess) MAFFT (system binary) Multiple sequence alignment, fast and accurate
MSA (subprocess, alt) Clustal Omega (system binary) Alternative MSA tool
Docking engine vina (Python package) AutoDock Vina molecular docking
Ligand preparation meeko SMILES/SDF → PDBQT for Vina
Receptor preparation PDBFixer Fix missing atoms/residues in PDB structures
Format conversion openbabel-wheel PDB ↔ PDBQT and other format conversions
Pocket detection Fpocket (system binary) Detect druggable binding pockets from protein structure
Interaction analysis PLIP Protein-Ligand Interaction Profiler (H-bonds, hydrophobic, pi-stacking)
3D protein viewer 3Dmol.js (already present) Cartoon, surface, ball-and-stick rendering
Sequence alignment viz BioJS MSA Viewer (JS) Interactive, scrollable MSA visualization in browser
Static alignment images pyMSAviz Server-side alignment image generation (PNG/SVG)
ML ADME/Tox (optional) DeepChem ML-based ADME and toxicity predictions
Plugin system Python Protocol classes + entry points Extensible plugin architecture
External API client httpx (already present) Fetch from UniProt, RCSB, AlphaFold REST APIs

System Binary Dependencies

These are not pip-installable and must be present on the system PATH:

Binary Install Method Required For
fpocket brew install fpocket / compile from source Binding site detection
mafft brew install mafft / apt install mafft Multiple sequence alignment
clustalo (optional) brew install clustal-omega Alternative MSA
foldseek (optional) Download binary from Foldseek Structural homology search

Python Package Additions (requirements.txt)

biopython>=1.83
tmtools>=0.1.0
vina>=1.2.5
meeko>=0.5.0
openbabel-wheel>=3.1.1
pdbfixer>=1.9
plip>=2.3.0
pymsa>=0.7.0
pymsaviz>=0.4.0
deepchem>=2.7.0       # optional, for ML-based predictions

Shared Infrastructure

Database

All modules share a single PostgreSQL/SQLite database. New tables are added via Alembic migrations. The existing Base declarative base is extended with new models. Foreign key relationships link modules (e.g., DockingResult.compound_id → Compound.id).

API Layer

All new endpoints follow the same FastAPI patterns established in Phases 1-6. New routers are registered in chemlib/main.py. Each module gets its own router file under chemlib/api/.

Service Layer

New services follow the same stateless pattern. They call DB service methods for persistence and chemistry/bioinformatics utilities for computation. Services never access the database directly.

Configuration

New settings are added to chemlib/config.py:

class Settings(BaseSettings):
    # ... existing settings ...

    # Protein module
    UNIPROT_API_BASE: str = "https://rest.uniprot.org"
    RCSB_API_BASE: str = "https://data.rcsb.org"
    ALPHAFOLD_API_BASE: str = "https://alphafold.ebi.ac.uk/api"

    # Docking
    VINA_EXHAUSTIVENESS: int = 32
    VINA_NUM_POSES: int = 10
    VINA_ENERGY_RANGE: float = 3.0

    # Pipeline
    PIPELINE_BATCH_SIZE: int = 100
    PIPELINE_MAX_COMPOUNDS: int = 100_000

    # Fpocket
    FPOCKET_BINARY: str = "fpocket"
    MAFFT_BINARY: str = "mafft"

Error Handling

New exception classes extending the existing hierarchy:

class ProteinNotFoundError(ChemLibError): ...     # → 404
class StructureNotFoundError(ChemLibError): ...    # → 404
class ExternalAPIError(ChemLibError): ...          # → 502
class DockingError(ChemLibError): ...              # → 500
class PipelineError(ChemLibError): ...             # → 400
class PluginNotFoundError(ChemLibError): ...       # → 404
class PluginConfigError(ChemLibError): ...         # → 422
class BindingSiteError(ChemLibError): ...          # → 400

Project Structure (Extended)

chemlib/
├── chemlib/
│   ├── models/
│   │   ├── compound.py             # (existing)
│   │   ├── structure.py            # (existing — Conformer)
│   │   ├── assembly.py             # (existing)
│   │   ├── reaction.py             # (existing)
│   │   ├── protein.py              # NEW: ProteinTarget, ProteinStructure, BindingSite
│   │   ├── alignment.py            # NEW: SequenceAlignment, StructuralAlignment
│   │   ├── pipeline.py             # NEW: Pipeline, PipelineRun, PipelineRunResult
│   │   ├── docking.py              # NEW: DockingRun, DockingResult
│   │   └── plugin.py               # NEW: FilterPluginRegistry
│   ├── db/
│   │   └── service.py              # Extended with new DB services
│   ├── api/
│   │   ├── compounds.py            # (existing)
│   │   ├── fragments.py            # (existing)
│   │   ├── assembly.py             # (existing)
│   │   ├── visualization.py        # (existing)
│   │   ├── scoring.py              # (existing)
│   │   ├── targets.py              # NEW: /api/targets/
│   │   ├── structures.py           # NEW: /api/structures/
│   │   ├── alignments.py           # NEW: /api/alignments/
│   │   ├── docking.py              # NEW: /api/docking/
│   │   ├── pipelines.py            # NEW: /api/pipelines/
│   │   └── plugins.py              # NEW: /api/plugins/
│   ├── services/
│   │   ├── compound_service.py     # (existing)
│   │   ├── fragment_service.py     # (existing)
│   │   ├── assembly_service.py     # (existing)
│   │   ├── conformer_service.py    # (existing)
│   │   ├── scoring_service.py      # (existing)
│   │   ├── viz_service.py          # (existing)
│   │   ├── protein_target_service.py    # NEW
│   │   ├── protein_structure_service.py # NEW
│   │   ├── binding_site_service.py      # NEW
│   │   ├── protein_prep_service.py      # NEW
│   │   ├── alignment_service.py         # NEW
│   │   ├── docking_service.py           # NEW
│   │   ├── interaction_service.py       # NEW
│   │   ├── pipeline_service.py          # NEW
│   │   ├── pipeline_executor.py         # NEW
│   │   └── plugin_registry_service.py   # NEW
│   ├── schemas/
│   │   ├── compound.py             # (existing)
│   │   ├── fragment.py             # (existing)
│   │   ├── assembly.py             # (existing)
│   │   ├── scoring.py              # (existing)
│   │   ├── protein.py              # NEW
│   │   ├── alignment.py            # NEW
│   │   ├── docking.py              # NEW
│   │   ├── pipeline.py             # NEW
│   │   └── plugin.py               # NEW
│   ├── chemistry/                  # (existing, unchanged)
│   ├── bioinformatics/             # NEW: Pure bioinformatics utilities
│   │   ├── __init__.py
│   │   ├── pdb_parser.py           # PDB/mmCIF parsing via Biopython
│   │   ├── sequence_tools.py       # Sequence alignment, format conversion
│   │   ├── structural_tools.py     # TM-align, superimposition
│   │   ├── pocket_detection.py     # Fpocket subprocess wrapper
│   │   ├── protein_prep.py         # PDBFixer wrapper
│   │   └── external_apis.py        # UniProt, RCSB, AlphaFold API clients
│   ├── docking/                    # NEW: Docking utilities (no DB)
│   │   ├── __init__.py
│   │   ├── ligand_prep.py          # SMILES → PDBQT via meeko
│   │   ├── receptor_prep.py        # PDB → PDBQT
│   │   ├── vina_runner.py          # AutoDock Vina Python API wrapper
│   │   └── interaction_analysis.py # PLIP wrapper
│   ├── plugins/                    # NEW: Plugin system
│   │   ├── __init__.py
│   │   ├── protocols.py            # Protocol classes (FilterPlugin, etc.)
│   │   ├── registry.py             # Plugin discovery + registration
│   │   └── builtin/               # Built-in plugin implementations
│   │       ├── __init__.py
│   │       ├── property_filters.py
│   │       ├── similarity_filters.py
│   │       ├── adme_filters.py
│   │       ├── docking_filter.py
│   │       └── external_filters.py
│   ├── templates/
│   │   ├── base.html               # (existing)
│   │   ├── index.html              # Updated: platform dashboard
│   │   ├── protein_browser.html    # NEW
│   │   ├── protein_detail.html     # NEW
│   │   ├── alignment_viewer.html   # NEW
│   │   ├── docking_viewer.html     # NEW
│   │   ├── pipeline_builder.html   # NEW
│   │   ├── pipeline_results.html   # NEW
│   │   └── plugin_marketplace.html # NEW
│   └── static/
│       ├── js/
│       │   ├── viewer.js           # (existing, extended)
│       │   ├── protein_viewer.js   # NEW: protein-specific 3Dmol.js code
│       │   ├── pipeline_editor.js  # NEW: DAG editor
│       │   └── msa_viewer.js       # NEW: BioJS MSA integration
│       └── css/
│           └── platform.css        # NEW: platform-wide styles
├── tests/
│   ├── test_protein/               # NEW
│   ├── test_alignment/             # NEW
│   ├── test_docking/               # NEW
│   ├── test_pipeline/              # NEW
│   └── test_plugins/               # NEW
└── scripts/
    ├── seed_fragments.py           # (existing)
    └── seed_plugins.py             # NEW: Register built-in plugins

Cross-Module Integration Points

From Module To Module Integration
Chemical Library Screening Pipeline Compounds/assembled molecules are pipeline input
Protein Target Library Docking Protein structures + binding sites are docking targets
Docking Screening Pipeline Docking is a filter node in the pipeline
Structural Biology Protein Target Library Alignments reference protein targets
Plugin Marketplace Screening Pipeline Plugins become pipeline filter nodes
All Modules Visualization 3Dmol.js for 3D, BioJS for sequences, charts for dashboards

Naming Conventions

Entity Pattern Example
ORM Models PascalCase, singular ProteinTarget, DockingResult
DB Services {Model}DBService ProteinTargetDBService
Business Services {Domain}Service DockingService, AlignmentService
API Routers chemlib/api/{plural_noun}.py targets.py, pipelines.py
Pydantic Schemas {Model}Create, {Model}Response, {Model}Filter ProteinTargetCreate, DockingResultResponse
Bioinformatics utils chemlib/bioinformatics/{function}.py pdb_parser.py, sequence_tools.py
Docking utils chemlib/docking/{function}.py vina_runner.py, ligand_prep.py
Plugin implementations chemlib/plugins/builtin/{category}.py property_filters.py