Skip to content

Project Overview — ChemLib

Vision

ChemLib is a fragment-based drug design platform that enables users to: 1. Curate a library of chemical fragments (small molecular building blocks) 2. Assemble fragments into larger drug-like molecules using chemically realistic connections 3. Evaluate assembled molecules for drug-likeness and synthetic accessibility 4. Visualize molecules in interactive 3D, including energy-minimized conformations 5. Persist all data in a well-structured relational database

Problem Statement

Drug discovery requires exploring vast chemical spaces efficiently. Fragment-based drug design (FBDD) tackles this by: - Starting with small, rule-of-three-compliant fragments - Combining fragments in ways that mimic real synthetic chemistry - Evaluating the resulting molecules for pharmaceutical viability

Existing tools are often fragmented themselves — separate tools for drawing, computing, storing, and viewing molecules. ChemLib integrates these into a single platform.

Core Capabilities

1. Chemical Compound Library

  • Import compounds via SMILES, SDF files, or MOL files
  • Automatically compute and store canonical SMILES, InChI, InChIKey
  • Calculate molecular properties (MW, LogP, TPSA, HBD, HBA, etc.)
  • Generate and store 2D coordinate depictions
  • Generate and store 3D conformers with energy-minimized geometries

2. Fragment Decomposition

  • Decompose existing compounds into fragments using BRICS algorithm
  • Store fragments with labeled attachment points (dummy atoms)
  • Track fragment provenance (which compound they came from)
  • Maintain a searchable fragment library with property filters

3. Molecule Assembly

  • Select fragments from the library
  • Connect fragments at compatible attachment points (BRICS rules)
  • Build molecules step-by-step, adding one fragment at a time
  • Validate chemical sanity at each step (valence, aromaticity)
  • Track the full assembly history (which fragments, which connections)

4. Evaluation & Scoring

  • Drug-likeness: Lipinski Rule of Five, Veber rules, QED score
  • Synthetic accessibility: SA Score (Ertl-Schuffenhauer)
  • Structural alerts: PAINS filter, Brenk filter
  • Similarity search: Tanimoto similarity using Morgan fingerprints
  • Present scores as a dashboard for each assembled molecule

5. 3D Visualization & Energy Minimization

  • Interactive 3D viewer (3Dmol.js) embedded in the web UI
  • Rotate, zoom, pan molecules in real time
  • Display styles: ball-and-stick, stick, sphere, surface
  • Conformer generation (ETKDGv3) with MMFF94 energy minimization
  • Show the lowest-energy conformer by default
  • Allow browsing multiple conformers

Technology Stack

Layer Technology Rationale
Chemistry RDKit Industry-standard open-source cheminformatics
Database PostgreSQL / SQLite PostgreSQL for production (+ RDKit cartridge), SQLite for dev
ORM SQLAlchemy 2.0 Modern async support, type-safe mapped dataclasses
Migrations Alembic Standard SQLAlchemy migration tool
API FastAPI Async, auto-docs (OpenAPI), Pydantic validation
Frontend HTML/JS + 3Dmol.js Lightweight, no heavy framework needed
Visualization 3Dmol.js / py3Dmol WebGL-based, supports SDF/PDB/MOL2
Testing pytest + httpx Async test client for FastAPI

Data Flow

┌─────────────┐     ┌──────────────┐     ┌───────────────┐     ┌──────────┐
│   Browser    │────▶│  FastAPI API  │────▶│   Services    │────▶│ DB Svc   │
│   (3Dmol.js) │◀────│  (Routes)    │◀────│  (Logic)      │◀────│ (CRUD)   │
└─────────────┘     └──────────────┘     └───────────────┘     └──────────┘
                                               │                     │
                                               ▼                     ▼
                                         ┌───────────┐        ┌──────────┐
                                         │ Chemistry │        │ SQLAlch  │
                                         │ (RDKit)   │        │ Models   │
                                         └───────────┘        └──────────┘
                                                              ┌──────────┐
                                                              │ Database │
                                                              └──────────┘

User Workflows

Workflow 1: Import & Browse Compounds

  1. User uploads an SDF file or enters SMILES
  2. API validates structure, computes properties
  3. Compound is stored with 1D (SMILES/InChI), 2D (MOL block), properties
  4. User browses compounds with property filters

Workflow 2: Fragment Library Management

  1. User selects compounds to decompose
  2. System runs BRICS decomposition, extracts fragments
  3. Fragments are stored with attachment point metadata
  4. User browses/searches fragment library

Workflow 3: Assemble a New Molecule

  1. User selects a starting fragment
  2. System shows compatible fragments (matching attachment points)
  3. User selects next fragment and attachment point
  4. System joins fragments, validates chemistry, shows result
  5. Repeat until desired molecule is built
  6. Assembly history is saved

Workflow 4: Evaluate & Visualize

  1. User selects an assembled molecule
  2. System computes drug-likeness scores, SA score, filters
  3. System generates 3D conformers, minimizes energy
  4. User views molecule in interactive 3D viewer
  5. User can switch between conformers, display styles

Scope Boundaries

In Scope

  • Fragment library CRUD
  • BRICS-based decomposition and recombination
  • Property calculation and drug-likeness scoring
  • 3D conformer generation and MMFF94 minimization
  • Interactive 3D web viewer
  • RESTful API with full OpenAPI documentation
  • Database persistence with migrations

Out of Scope (Future Work)

  • Docking simulations (requires protein targets)
  • Retrosynthetic route planning (requires reaction databases)
  • Machine learning property prediction
  • Multi-user authentication/authorization
  • Reaction execution tracking (wet lab integration)
  • Commercial compound vendor integration