Project Overview — ChemLib¶
Vision¶
ChemLib is a fragment-based drug design platform that enables users to: 1. Curate a library of chemical fragments (small molecular building blocks) 2. Assemble fragments into larger drug-like molecules using chemically realistic connections 3. Evaluate assembled molecules for drug-likeness and synthetic accessibility 4. Visualize molecules in interactive 3D, including energy-minimized conformations 5. Persist all data in a well-structured relational database
Problem Statement¶
Drug discovery requires exploring vast chemical spaces efficiently. Fragment-based drug design (FBDD) tackles this by: - Starting with small, rule-of-three-compliant fragments - Combining fragments in ways that mimic real synthetic chemistry - Evaluating the resulting molecules for pharmaceutical viability
Existing tools are often fragmented themselves — separate tools for drawing, computing, storing, and viewing molecules. ChemLib integrates these into a single platform.
Core Capabilities¶
1. Chemical Compound Library¶
- Import compounds via SMILES, SDF files, or MOL files
- Automatically compute and store canonical SMILES, InChI, InChIKey
- Calculate molecular properties (MW, LogP, TPSA, HBD, HBA, etc.)
- Generate and store 2D coordinate depictions
- Generate and store 3D conformers with energy-minimized geometries
2. Fragment Decomposition¶
- Decompose existing compounds into fragments using BRICS algorithm
- Store fragments with labeled attachment points (dummy atoms)
- Track fragment provenance (which compound they came from)
- Maintain a searchable fragment library with property filters
3. Molecule Assembly¶
- Select fragments from the library
- Connect fragments at compatible attachment points (BRICS rules)
- Build molecules step-by-step, adding one fragment at a time
- Validate chemical sanity at each step (valence, aromaticity)
- Track the full assembly history (which fragments, which connections)
4. Evaluation & Scoring¶
- Drug-likeness: Lipinski Rule of Five, Veber rules, QED score
- Synthetic accessibility: SA Score (Ertl-Schuffenhauer)
- Structural alerts: PAINS filter, Brenk filter
- Similarity search: Tanimoto similarity using Morgan fingerprints
- Present scores as a dashboard for each assembled molecule
5. 3D Visualization & Energy Minimization¶
- Interactive 3D viewer (3Dmol.js) embedded in the web UI
- Rotate, zoom, pan molecules in real time
- Display styles: ball-and-stick, stick, sphere, surface
- Conformer generation (ETKDGv3) with MMFF94 energy minimization
- Show the lowest-energy conformer by default
- Allow browsing multiple conformers
Technology Stack¶
| Layer | Technology | Rationale |
|---|---|---|
| Chemistry | RDKit | Industry-standard open-source cheminformatics |
| Database | PostgreSQL / SQLite | PostgreSQL for production (+ RDKit cartridge), SQLite for dev |
| ORM | SQLAlchemy 2.0 | Modern async support, type-safe mapped dataclasses |
| Migrations | Alembic | Standard SQLAlchemy migration tool |
| API | FastAPI | Async, auto-docs (OpenAPI), Pydantic validation |
| Frontend | HTML/JS + 3Dmol.js | Lightweight, no heavy framework needed |
| Visualization | 3Dmol.js / py3Dmol | WebGL-based, supports SDF/PDB/MOL2 |
| Testing | pytest + httpx | Async test client for FastAPI |
Data Flow¶
┌─────────────┐ ┌──────────────┐ ┌───────────────┐ ┌──────────┐
│ Browser │────▶│ FastAPI API │────▶│ Services │────▶│ DB Svc │
│ (3Dmol.js) │◀────│ (Routes) │◀────│ (Logic) │◀────│ (CRUD) │
└─────────────┘ └──────────────┘ └───────────────┘ └──────────┘
│ │
▼ ▼
┌───────────┐ ┌──────────┐
│ Chemistry │ │ SQLAlch │
│ (RDKit) │ │ Models │
└───────────┘ └──────────┘
│
▼
┌──────────┐
│ Database │
└──────────┘
User Workflows¶
Workflow 1: Import & Browse Compounds¶
- User uploads an SDF file or enters SMILES
- API validates structure, computes properties
- Compound is stored with 1D (SMILES/InChI), 2D (MOL block), properties
- User browses compounds with property filters
Workflow 2: Fragment Library Management¶
- User selects compounds to decompose
- System runs BRICS decomposition, extracts fragments
- Fragments are stored with attachment point metadata
- User browses/searches fragment library
Workflow 3: Assemble a New Molecule¶
- User selects a starting fragment
- System shows compatible fragments (matching attachment points)
- User selects next fragment and attachment point
- System joins fragments, validates chemistry, shows result
- Repeat until desired molecule is built
- Assembly history is saved
Workflow 4: Evaluate & Visualize¶
- User selects an assembled molecule
- System computes drug-likeness scores, SA score, filters
- System generates 3D conformers, minimizes energy
- User views molecule in interactive 3D viewer
- User can switch between conformers, display styles
Scope Boundaries¶
In Scope¶
- Fragment library CRUD
- BRICS-based decomposition and recombination
- Property calculation and drug-likeness scoring
- 3D conformer generation and MMFF94 minimization
- Interactive 3D web viewer
- RESTful API with full OpenAPI documentation
- Database persistence with migrations
Out of Scope (Future Work)¶
- Docking simulations (requires protein targets)
- Retrosynthetic route planning (requires reaction databases)
- Machine learning property prediction
- Multi-user authentication/authorization
- Reaction execution tracking (wet lab integration)
- Commercial compound vendor integration