Skip to content

Platform Implementation Plan — ChemLib Drug Discovery Platform

Overview

Click diagram to zoom and pan:

Platform Architecture Diagram

This document defines the phased implementation plan for extending ChemLib from a chemical library tool into a full Computer-Aided Drug Discovery (CADD) platform. The existing system (Phases 1-6) provides compound management, fragment decomposition, BRICS assembly, drug-likeness scoring, and 3D visualization. Phases 7-12 add protein target management, structural biology tools, molecular docking, screening pipelines, and a plugin marketplace.


Prerequisites

Before starting Phase 7, the following must be complete and working:

  • Phase 1: Database + ORM models + Alembic migrations
  • Phase 2: Chemistry engine (RDKit utilities)
  • Phase 3: API layer (FastAPI routes, Pydantic schemas)
  • Phase 4: Assembly engine (BRICS fragment joining)
  • Phase 5: Scoring and filtering (Lipinski, PAINS, QED, SA Score)
  • Phase 6: 3D visualization (3Dmol.js, conformer generation)

All existing tests pass. The application runs with uvicorn chemlib.main:app --reload.


Click diagram to zoom and pan:

Data Flow Workflow Diagram

Dependency Graph

Phase 1-6 (existing ChemLib) ─────────────────────────────────────────────┐
       │                                                                    │
       ├──────────────┬─────────────────┐                                  │
       ▼              ▼                 │                                  │
   Phase 7        Phase 8              │                                  │
   Protein        Structural           │                                  │
   Targets        Biology              │                                  │
       │              │                 │                                  │
       ▼              │                 │                                  │
   Phase 9 ◀──────────┘                │                                  │
   Binding Site                        │                                  │
   Detection                           │                                  │
       │                                │                                  │
       ▼                                │                                  │
   Phase 10                             │                                  │
   Docking                              │                                  │
   Engine                               │                                  │
       │                                │                                  │
       ▼                                ▼                                  │
   Phase 11 ◀───────────────────────────┘                                  │
   Screening                                                               │
   Pipeline                                                                │
       │                                                                    │
       ▼                                                                    │
   Phase 12                                                                │
   Plugin                                                                  │
   Marketplace                                                             │

Parallelizable: Phases 7 and 8 can be built simultaneously.


Phase 7: Protein Target Library

Goal: Enable users to import, browse, and manage protein targets and their 3D structures from UniProt, RCSB PDB, and AlphaFold DB.

Design Document: docs/PROTEIN_TARGET_LIBRARY.md

New Dependencies

biopython>=1.83
tmtools>=0.1.0
httpx (already present)

Tasks

7.1 Database Models

File Contents
chemlib/models/protein.py ProteinTarget, ProteinStructure, BindingSite ORM models
chemlib/models/__init__.py Register new models with Base
  • Create all three models with full column definitions per PROTEIN_TARGET_LIBRARY.md
  • Add relationships: ProteinTarget → ProteinStructure (1:N), ProteinStructure → BindingSite (1:N)
  • Alembic migration: alembic revision --autogenerate -m "add protein target tables"
  • Test: model creation, relationships, cascade delete

7.2 External API Clients

File Contents
chemlib/bioinformatics/__init__.py Package init
chemlib/bioinformatics/external_apis.py UniProtClient, RCSBClient, AlphaFoldClient
  • Implement HTTP clients using httpx.AsyncClient
  • UniProt: fetch entry, search, parse entry
  • RCSB: fetch entry info, download PDB, download mmCIF, get UniProt mapping
  • AlphaFold: fetch prediction, download PDB
  • Test with mocked responses (record real responses as fixtures)

7.3 PDB Parser Utilities

File Contents
chemlib/bioinformatics/pdb_parser.py PDB/mmCIF parsing functions
  • parse_pdb_string(), parse_mmcif_string() using Biopython
  • extract_chains(), extract_ligands(), extract_sequence_from_chain()
  • get_resolution(), get_method()
  • Test with sample PDB files

7.4 Pydantic Schemas

File Contents
chemlib/schemas/protein.py All Pydantic models for protein targets, structures, binding sites
  • ProteinTargetCreate, ProteinTargetResponse, ProteinTargetFilter, ProteinTargetListResponse
  • ProteinStructureCreate, ProteinStructureResponse, ProteinStructureDetailResponse
  • BindingSiteCreate, BindingSiteFromLigand, BindingSiteResponse

7.5 DB Service Layer

File Contents
chemlib/db/service.py Add ProteinTargetDBService, ProteinStructureDBService, BindingSiteDBService
  • CRUD operations for all three models
  • Specialized queries: list structures for target, list binding sites for structure

7.6 Business Services

File Contents
chemlib/services/protein_target_service.py ProteinTargetService
chemlib/services/protein_structure_service.py ProteinStructureService
  • import_from_uniprot(accession): call UniProtClient, parse, store
  • import_from_pdb(pdb_id): call RCSBClient, resolve UniProt, store
  • search_uniprot(query): proxy search to UniProt API
  • fetch_from_rcsb(pdb_id, target_id): download PDB, parse metadata, store
  • fetch_from_alphafold(uniprot_id, target_id): download AlphaFold PDB, store
  • upload_structure(target_id, file_data): validate, parse, store

7.7 API Routes

File Contents
chemlib/api/targets.py /api/targets/ CRUD + fetch endpoints
chemlib/api/structures.py /api/structures/ CRUD + chain/sequence endpoints
  • Register routes in chemlib/main.py
  • All endpoints per PROTEIN_TARGET_LIBRARY.md API section

7.8 UI

File Contents
chemlib/templates/protein_browser.html Protein target list page
chemlib/templates/protein_detail.html Protein detail with 3D viewer
chemlib/static/js/protein_viewer.js 3Dmol.js protein viewer component
  • Protein browser: table with search, filter, pagination
  • Protein detail: metadata display, structure list, 3D viewer (cartoon mode)
  • Viewer: load PDB data, cartoon/surface/stick styles, chain selection

7.9 Tests

Directory Contents
tests/test_protein/test_models.py ORM model tests
tests/test_protein/test_services.py Service tests with mocked APIs
tests/test_protein/test_api.py API endpoint tests
tests/test_bioinformatics/test_pdb_parser.py PDB parsing tests
tests/test_bioinformatics/test_external_apis.py API client tests (mocked)

Deliverable

  • Protein target browser with import from UniProt
  • 3D structure viewer (cartoon mode) for PDB structures fetched from RCSB/AlphaFold
  • Full CRUD API for targets and structures

Phase 8: Structural Biology Tools

Goal: Add sequence alignment (pairwise + MSA), structural alignment (TM-align), and visualization of alignment results.

Design Document: docs/PROTEIN_TARGET_LIBRARY.md (alignment sections)

Can be built in parallel with Phase 7.

New Dependencies

# Python packages
tmtools>=0.1.0     (may already be added in Phase 7)
pymsaviz>=0.4.0

# System binaries (must be on PATH)
mafft              # brew install mafft / apt install mafft

Tasks

8.1 Database Models

File Contents
chemlib/models/alignment.py SequenceAlignment, StructuralAlignment ORM models
  • Alembic migration: add alignment tables
  • Note: StructuralAlignment has FK to ProteinStructure — requires Phase 7 tables to exist

8.2 Sequence Alignment Utilities

File Contents
chemlib/bioinformatics/sequence_tools.py pairwise_align_biopython(), multiple_align_mafft(), multiple_align_clustalo()
  • Biopython PairwiseAligner with BLOSUM62
  • MAFFT subprocess wrapper (write FASTA temp file, parse output)
  • Identity percentage computation
  • Test with known sequence pairs

8.3 Structural Alignment Utilities

File Contents
chemlib/bioinformatics/structural_tools.py tm_align(), superimpose_biopython(), apply_transformation()
  • tmtools for TM-align: extract CA coords, run alignment, get rotation/translation
  • Biopython Superimposer as fallback
  • Apply transformation to generate superposed PDB
  • Test with known structure pairs

8.4 Pydantic Schemas

File Contents
chemlib/schemas/alignment.py SequenceAlignmentRequest, SequenceInput, SequenceAlignmentResponse, StructuralAlignmentRequest, StructuralAlignmentResponse

8.5 Business Services

File Contents
chemlib/services/alignment_service.py AlignmentService
  • pairwise_sequence_align(): validate inputs, run alignment, store result
  • multiple_sequence_align(): validate inputs, run MAFFT, store result
  • structural_align(): load structures, run tmtools/Superimposer, store result
  • generate_alignment_image(): use pyMSAviz to render PNG

8.6 API Routes

File Contents
chemlib/api/alignments.py /api/alignments/ endpoints
  • POST /api/alignments/sequence — run sequence alignment
  • GET /api/alignments/sequence/{id} — get result
  • GET /api/alignments/sequence/{id}/image — get PNG image
  • POST /api/alignments/structure — run structural alignment
  • GET /api/alignments/structure/{id} — get result
  • Register in chemlib/main.py

8.7 UI

File Contents
chemlib/templates/alignment_viewer.html Alignment results page
chemlib/static/js/msa_viewer.js BioJS MSA Viewer integration
  • Sequence alignment viewer: BioJS MSA Viewer with color schemes (Clustal, Zappo)
  • Structural alignment viewer: 3Dmol.js overlay of two structures with different colors
  • Display metrics: identity %, score, TM-score, RMSD
  • Download buttons: FASTA, PNG image

8.8 Tests

Directory Contents
tests/test_alignment/test_sequence.py Sequence alignment tests
tests/test_alignment/test_structural.py Structural alignment tests (requires tmtools)
tests/test_alignment/test_api.py API endpoint tests

Deliverable

  • Pairwise sequence alignment with Biopython
  • Multiple sequence alignment with MAFFT
  • Structural alignment with TM-align (tmtools)
  • Interactive alignment viewers in the browser

Phase 9: Binding Site Detection & Protein Preparation

Goal: Detect druggable binding pockets using Fpocket, define binding sites from co-crystallized ligands or manually, and prepare proteins for docking.

Depends on: Phase 7 (ProteinStructure, BindingSite models must exist)

New Dependencies

# Python packages
pdbfixer>=1.9

# System binaries
fpocket             # brew install fpocket / compile from source

Tasks

9.1 Fpocket Integration

File Contents
chemlib/bioinformatics/pocket_detection.py run_fpocket(), parse_fpocket_info(), parse_pocket_pdb()
  • Write PDB to temp file, run fpocket -f file.pdb as subprocess
  • Parse output: info.txt (scores), pocket PDB files (coordinates)
  • Extract center, box size, residues, druggability score, volume per pocket
  • Clean up temp files
  • Test with a small known protein (mark as integration test, skip if fpocket not installed)

9.2 Protein Preparation

File Contents
chemlib/bioinformatics/protein_prep.py fix_protein(), remove_water(), add_hydrogens()
  • PDBFixer integration for fixing missing atoms/residues
  • Remove heterogens and water
  • Add hydrogens at specified pH
  • Test with broken PDB files

9.3 Binding Site Service

File Contents
chemlib/services/binding_site_service.py BindingSiteService
  • detect_pockets(structure_id): run Fpocket, parse, store BindingSite records
  • define_from_ligand(structure_id, ligand_id, padding): find ligand atoms, compute center/box, find nearby residues
  • define_manual(structure_id, data): store user-defined binding site
  • CRUD operations for binding sites

9.4 Protein Preparation Service

File Contents
chemlib/services/protein_prep_service.py ProteinPrepService
  • prepare_for_docking(structure_id): fix protein, add hydrogens, return fixed PDB

9.5 API Extensions

Add to existing chemlib/api/structures.py: - POST /api/structures/{id}/detect-pockets — run Fpocket - POST /api/structures/{id}/binding-sites — define manual - POST /api/structures/{id}/binding-sites/from-ligand — define from ligand - GET /api/structures/{id}/binding-sites — list binding sites - POST /api/structures/{id}/prepare — prepare for docking

9.6 UI Extensions

Update chemlib/templates/protein_detail.html: - Add binding site list section - Add "Detect Pockets" button - 3Dmol.js: show binding site box overlay, residue highlighting - Binding site detail modal with center/box/residues

9.7 Tests

Directory Contents
tests/test_protein/test_binding_site.py Binding site service tests
tests/test_bioinformatics/test_pocket_detection.py Fpocket tests (integration, skip if binary missing)
tests/test_bioinformatics/test_protein_prep.py PDBFixer tests

Deliverable

  • Fpocket pocket detection with druggability scores
  • Binding site definition from ligand or manual
  • Protein preparation (fix, add H) for docking
  • Binding site visualization in 3D viewer

Phase 10: Docking Engine

Goal: Integrate AutoDock Vina for molecular docking, meeko for ligand preparation, PLIP for interaction analysis, and visualization of docked poses.

Design Document: docs/DOCKING_INTEGRATION.md

Depends on: Phase 9 (binding sites, protein preparation)

New Dependencies

# Python packages
vina>=1.2.5
meeko>=0.5.0
openbabel-wheel>=3.1.0
plip>=2.3.0

Tasks

10.1 Database Models

File Contents
chemlib/models/docking.py DockingRun, DockingResult ORM models
  • Alembic migration: add docking tables
  • FKs to: ProteinStructure, BindingSite, Compound, AssembledMolecule

10.2 Docking Utilities

File Contents
chemlib/docking/__init__.py Package init
chemlib/docking/ligand_prep.py smiles_to_pdbqt(), mol_to_pdbqt(), pdbqt_to_pdb(), split_pdbqt_poses()
chemlib/docking/receptor_prep.py pdb_to_pdbqt(), prepare_receptor_full()
chemlib/docking/vina_runner.py dock_ligand(), score_ligand()
chemlib/docking/interaction_analysis.py analyze_complex()
  • meeko for ligand PDBQT conversion
  • Open Babel subprocess for receptor PDBQT conversion
  • Vina Python API for docking
  • PLIP for interaction profiling
  • Test each utility independently

10.3 Pydantic Schemas

File Contents
chemlib/schemas/docking.py DockingRunCreate, DockingRunResponse, DockingResultResponse, DockingResultDetailResponse, InteractionAnalysisResponse

10.4 Business Services

File Contents
chemlib/services/docking_service.py DockingService
chemlib/services/interaction_service.py InteractionService
  • prepare_receptor(): PDBFixer + PDBQT conversion, cache
  • prepare_ligand(): SMILES → 3D → meeko → PDBQT
  • dock(): batch docking — prepare receptor once, dock each ligand
  • dock_single(): low-level single ligand docking
  • analyze_interactions(): PLIP on protein-ligand complex
  • Background execution for batch docking

10.5 API Routes

File Contents
chemlib/api/docking.py /api/docking/ endpoints
  • Full endpoint set per DOCKING_INTEGRATION.md
  • Background task for batch docking runs
  • Register in chemlib/main.py

10.6 UI

File Contents
chemlib/templates/docking_viewer.html Docking results page
chemlib/static/js/protein_viewer.js Extend with docking pose display
  • 3D viewer: protein (cartoon, gray) + ligand (sticks, green) + binding site (surface)
  • Interaction display: H-bond dashes, contact labels
  • Results table: ranked by score, interaction counts
  • Pose selector dropdown (switch between top N poses)
  • Interaction diagram (2D summary)

10.7 Tests

Directory Contents
tests/test_docking/test_ligand_prep.py meeko conversion tests
tests/test_docking/test_receptor_prep.py PDBQT conversion tests
tests/test_docking/test_vina.py Docking tests (integration, requires vina)
tests/test_docking/test_plip.py Interaction analysis tests
tests/test_docking/test_service.py DockingService tests
tests/test_docking/test_api.py API endpoint tests

Deliverable

  • AutoDock Vina docking via API
  • Ligand + receptor preparation pipeline
  • PLIP interaction analysis
  • 3D docking pose viewer with interaction overlay
  • Batch docking with progress tracking

Phase 11: Screening Pipeline Engine

Goal: Build the configurable, DAG-based screening pipeline engine with a visual editor, background execution, and results visualization.

Design Document: docs/SCREENING_PIPELINE.md

Depends on: Phase 10 (docking as a filter node) + existing ChemLib (compounds, scoring)

New Dependencies

None — uses existing tools.

Tasks

11.1 Plugin Protocol and Built-in Filters

File Contents
chemlib/plugins/__init__.py Package init
chemlib/plugins/protocols.py FilterPlugin, FilterResult, PipelineContext protocols/dataclasses
chemlib/plugins/builtin/__init__.py Package init
chemlib/plugins/builtin/property_filters.py LipinskiFilter, VeberFilter, GhoseFilter, EganFilter, MueggeFilter, PAINSFilter, BrenkFilter, QEDThresholdFilter, SAScoreFilter, MWRangeFilter, LogPRangeFilter, TPSARangeFilter, HBDMaxFilter, HBAMaxFilter, RotBondsMaxFilter
chemlib/plugins/builtin/similarity_filters.py TanimotoSimilarityFilter, SubstructureMatchFilter, MACCSSimilarityFilter
chemlib/plugins/builtin/adme_filters.py ESOLSolubilityFilter, BBBRuleFilter, hERGRuleFilter, RuleOfThreeFilter
chemlib/plugins/builtin/docking_filter.py VinaDockingFilter
chemlib/plugins/builtin/external_filters.py PLIPInteractionFilter, ADMETlabFilter (stubbed)
  • Each filter implements the full FilterPlugin protocol
  • Each has proper config_schema (JSON Schema)
  • Test each filter independently with known molecules

11.2 Database Models

File Contents
chemlib/models/pipeline.py Pipeline, PipelineRun, PipelineRunResult, FilterPluginRegistry ORM models
  • Alembic migration: add pipeline and plugin tables
  • FKs to: ProteinTarget, Compound, AssembledMolecule

11.3 Pydantic Schemas

File Contents
chemlib/schemas/pipeline.py PipelineDefinition, PipelineNode, PipelineEdge, PipelineCreate, PipelineResponse, PipelineRunCreate, PipelineRunResponse, PipelineRunResultResponse, PipelineRunResultFilter, PluginRegistryResponse

11.4 Pipeline Executor

File Contents
chemlib/services/pipeline_executor.py PipelineExecutor class
  • DAG validation (no cycles)
  • Topological sort (Kahn's algorithm)
  • Plugin instantiation from registry
  • Batch processing with configurable batch size
  • Early termination (skip downstream for failed compounds)
  • Progress tracking (update PipelineRun status)
  • Results storage (PipelineRunResult per compound per node)
  • Error handling (catch plugin errors, mark compound as failed, continue)
  • Test with a simple 3-node pipeline and mock filters

11.5 Business Services

File Contents
chemlib/services/pipeline_service.py PipelineService
  • CRUD for pipeline definitions
  • DAG validation on create/update
  • Start run: resolve compound list, create PipelineRun, spawn background executor
  • Get run status, results, funnel summary
  • Cancel run

11.6 Plugin Seeding Script

File Contents
scripts/seed_plugins.py Register all built-in plugins in the database
  • Called once during setup or on app startup
  • Upserts to avoid duplicates

11.7 API Routes

File Contents
chemlib/api/pipelines.py /api/pipelines/ endpoints
chemlib/api/plugins.py /api/plugins/ endpoints
  • Full endpoint set per SCREENING_PIPELINE.md
  • Background task for pipeline execution
  • Register in chemlib/main.py

11.8 UI

File Contents
chemlib/templates/pipeline_builder.html Visual DAG editor page
chemlib/templates/pipeline_results.html Run results and funnel visualization
chemlib/static/js/pipeline_editor.js DAG editor (drag-and-drop nodes, connect edges)
chemlib/static/js/plugin_config_form.js JSON Schema → HTML form renderer
  • Pipeline builder: sidebar with filter nodes by category, canvas for DAG, config panel
  • Node drag-and-drop, edge drawing, node configuration
  • Serialize/deserialize to PipelineDefinition JSON
  • Results page: funnel bar chart, filterable results table
  • Progress polling during pipeline execution

11.9 Tests

Directory Contents
tests/test_plugins/test_property_filters.py All property filter tests
tests/test_plugins/test_similarity_filters.py Similarity filter tests
tests/test_plugins/test_adme_filters.py ADME filter tests
tests/test_pipeline/test_executor.py Pipeline executor tests
tests/test_pipeline/test_service.py Pipeline service tests
tests/test_pipeline/test_api.py API endpoint tests

Deliverable

  • 25+ built-in filter plugins
  • Visual pipeline builder (DAG editor)
  • Background pipeline execution with progress tracking
  • Funnel visualization of results

Phase 12: Plugin Marketplace

Goal: Formalize the plugin architecture, add entry point discovery, build the marketplace UI, and auto-generate config forms.

Design Document: docs/PLUGIN_MARKETPLACE.md

Depends on: Phase 11 (plugin protocols, registry, pipeline integration)

New Dependencies

None — uses existing infrastructure.

Tasks

12.1 Plugin Registry Service

File Contents
chemlib/plugins/registry.py PluginRegistryService
  • discover_and_register_all(): scan built-in + entry points
  • _register_builtin_plugins(): import all classes from chemlib.plugins.builtin.*
  • _register_entrypoint_plugins(): scan chemlib.plugins.filter, chemlib.plugins.docking, chemlib.plugins.adme entry point groups
  • get_plugin_instance(): instantiate and cache plugins
  • list_plugins(), get_plugin_info(), set_active()
  • Integrate with app startup in chemlib/main.py (call discover_and_register_all on startup)

12.2 Additional Protocol Classes

File Contents
chemlib/plugins/protocols.py Add DockingPlugin, ADMEPlugin, VisualizationPlugin, ExternalServicePlugin
  • Full Protocol definitions per PLUGIN_MARKETPLACE.md
  • DockingPluginResult, ADMEPrediction dataclasses

12.3 API Extensions

Extend chemlib/api/plugins.py: - PUT /api/plugins/{name}/active — enable/disable - POST /api/plugins/refresh — re-scan and register

12.4 Marketplace UI

File Contents
chemlib/templates/plugin_marketplace.html Plugin browser page
chemlib/static/js/plugin_config_form.js Enhance JSON Schema form renderer
  • Browse plugins by category (card layout)
  • Each card: name, description, category, estimated time, active status
  • "Configure" button: opens modal with auto-generated form
  • "Use in Pipeline" button: redirects to pipeline builder with plugin pre-selected
  • Category filter, search bar
  • Admin: enable/disable toggle

12.5 App Startup Integration

Update chemlib/main.py:

@app.on_event("startup")
async def register_plugins():
    async with get_db_session() as db:
        registry = PluginRegistryService()
        await registry.discover_and_register_all(db)

12.6 Entry Point Documentation

Create example for third-party plugin developers: - How to create a plugin package - How to implement FilterPlugin protocol - How to define entry points in pyproject.toml - How to test plugin compliance

12.7 Tests

Directory Contents
tests/test_plugins/test_registry.py Plugin discovery and registration tests
tests/test_plugins/test_protocol_compliance.py Verify all built-in plugins satisfy Protocol
tests/test_plugins/test_api.py Plugin API endpoint tests

Deliverable

  • Plugin registry with automatic discovery
  • Marketplace UI for browsing and configuring plugins
  • Entry point-based plugin installation support
  • All built-in plugins registered and available

Phase Summary Table

Phase Name New Python Deps System Binaries Key Deliverable Est. Effort
7 Protein Target Library biopython, tmtools Protein browser + 3D viewer Medium
8 Structural Biology pymsaviz mafft Alignment tools + visualization Medium
9 Binding Site Detection pdbfixer fpocket Pocket detection + protein prep Small
10 Docking Engine vina, meeko, openbabel-wheel, plip Molecular docking pipeline Large
11 Screening Pipeline Visual pipeline builder + executor Large
12 Plugin Marketplace Extensible plugin architecture Medium

Configuration Updates

chemlib/config.py Additions

class Settings(BaseSettings):
    # ... existing settings ...

    # Phase 7: External APIs
    UNIPROT_API_BASE: str = "https://rest.uniprot.org"
    RCSB_API_BASE: str = "https://data.rcsb.org"
    RCSB_FILES_BASE: str = "https://files.rcsb.org"
    ALPHAFOLD_API_BASE: str = "https://alphafold.ebi.ac.uk/api"
    EXTERNAL_API_TIMEOUT: int = 30  # seconds

    # Phase 8: Alignment tools
    MAFFT_BINARY: str = "mafft"
    CLUSTALO_BINARY: str = "clustalo"
    FOLDSEEK_BINARY: str = "foldseek"

    # Phase 9: Pocket detection
    FPOCKET_BINARY: str = "fpocket"
    DEFAULT_POCKET_MIN_DRUGGABILITY: float = 0.2
    DEFAULT_BINDING_SITE_PADDING: float = 5.0  # Angstroms

    # Phase 10: Docking
    VINA_EXHAUSTIVENESS: int = 32
    VINA_NUM_POSES: int = 10
    VINA_ENERGY_RANGE: float = 3.0
    DOCKING_DEFAULT_PH: float = 7.0

    # Phase 11: Pipeline
    PIPELINE_BATCH_SIZE: int = 100
    PIPELINE_MAX_COMPOUNDS: int = 100_000
    PIPELINE_POLL_INTERVAL: int = 3  # seconds (for UI polling)

requirements.txt Additions

# Phase 7
biopython>=1.83
tmtools>=0.1.0

# Phase 8
pymsaviz>=0.4.0

# Phase 9
pdbfixer>=1.9

# Phase 10
vina>=1.2.5
meeko>=0.5.0
openbabel-wheel>=3.1.0
plip>=2.3.0

Migration Strategy

Each phase creates its own Alembic migration. Migrations are ordered and can be applied incrementally:

# Phase 7
alembic revision --autogenerate -m "add protein target, structure, binding site tables"
alembic upgrade head

# Phase 8
alembic revision --autogenerate -m "add sequence and structural alignment tables"
alembic upgrade head

# Phase 10
alembic revision --autogenerate -m "add docking run and docking result tables"
alembic upgrade head

# Phase 11
alembic revision --autogenerate -m "add pipeline, pipeline run, pipeline result, filter plugin registry tables"
alembic upgrade head

Important: Phase 9 does not need its own migration because BindingSite is created in Phase 7's migration (it belongs to the protein module). Phase 12 does not need new tables — it uses the FilterPluginRegistry from Phase 11.


Router Registration Order

Update chemlib/main.py as phases are completed:

# chemlib/main.py

from fastapi import FastAPI
from chemlib.api import compounds, fragments, assembly, visualization, scoring  # Existing
from chemlib.api import targets, structures  # Phase 7
from chemlib.api import alignments           # Phase 8
# Phase 9 endpoints are in structures.py (already registered)
from chemlib.api import docking              # Phase 10
from chemlib.api import pipelines, plugins   # Phase 11-12

app = FastAPI(title="ChemLib Drug Discovery Platform")

# Existing
app.include_router(compounds.router)
app.include_router(fragments.router)
app.include_router(assembly.router)
app.include_router(visualization.router)
app.include_router(scoring.router)

# Phase 7
app.include_router(targets.router)
app.include_router(structures.router)

# Phase 8
app.include_router(alignments.router)

# Phase 10
app.include_router(docking.router)

# Phase 11-12
app.include_router(pipelines.router)
app.include_router(plugins.router)

Quality Checklist (Per Phase)

Before marking a phase as complete, verify:

  • [ ] All ORM models match the design document schemas exactly
  • [ ] Alembic migration applies cleanly on fresh DB and existing DB
  • [ ] All service methods have proper error handling (custom exceptions)
  • [ ] All API endpoints have Pydantic request/response validation
  • [ ] All endpoints are documented in OpenAPI (auto-generated from FastAPI)
  • [ ] External API calls handle timeouts, rate limits, and error responses
  • [ ] Subprocess calls (Fpocket, MAFFT, Open Babel) handle missing binaries gracefully
  • [ ] Background tasks properly update status on success and failure
  • [ ] UI pages are functional (load data, display results, handle errors)
  • [ ] Unit tests pass for all new utility functions
  • [ ] Integration tests pass for services
  • [ ] E2E tests pass for API endpoints
  • [ ] No regressions in existing tests
  • [ ] CLAUDE.md is updated with new modules and commands