Skip to content

System Architecture — ChemLib

Layer Architecture

Click diagram to zoom and pan:

ChemLib Layer Architecture Diagram

ChemLib follows a strict layered architecture with unidirectional dependencies:

┌─────────────────────────────────────────────────────────────────┐
│                        UI LAYER                                  │
│  HTML templates + JavaScript + 3Dmol.js                          │
│  Only communicates via HTTP to API layer                         │
└──────────────────────────┬──────────────────────────────────────┘
                           │ HTTP (REST JSON)
┌─────────────────────────────────────────────────────────────────┐
│                       API LAYER (FastAPI)                         │
│  Routes: parse requests, validate input (Pydantic), call         │
│  services, format responses. NO business logic here.             │
│                                                                  │
│  chemlib/api/                                                    │
│  ├── compounds.py    POST/GET/PUT/DELETE /api/compounds          │
│  ├── fragments.py    POST/GET /api/fragments                     │
│  ├── assembly.py     POST /api/assembly/*                        │
│  ├── visualization.py GET /api/viz/*                             │
│  └── scoring.py      GET/POST /api/scoring/*                     │
└──────────────────────────┬──────────────────────────────────────┘
                           │ Python function calls
┌─────────────────────────────────────────────────────────────────┐
│                     SERVICE LAYER                                │
│  Business logic. Orchestrates chemistry + DB operations.         │
│  Services are stateless — all state in DB.                       │
│                                                                  │
│  chemlib/services/                                               │
│  ├── compound_service.py   Import, validate, compute properties  │
│  ├── fragment_service.py   Decompose, manage fragment library    │
│  ├── assembly_service.py   Join fragments, build molecules       │
│  ├── conformer_service.py  Generate 3D, minimize energy          │
│  ├── scoring_service.py    Drug-likeness, SA, filters            │
│  └── viz_service.py        Prepare 3D viewer data                │
└──────────┬──────────────────────────────┬───────────────────────┘
           │                              │
           ▼                              ▼
┌─────────────────────┐    ┌─────────────────────────────────────┐
│   CHEMISTRY LAYER   │    │         DB SERVICE LAYER             │
│   Pure RDKit ops    │    │   SQLAlchemy CRUD operations         │
│   No DB access      │    │   Only layer that touches ORM        │
│                     │    │                                      │
│  chemlib/chemistry/ │    │  chemlib/db/service.py               │
│  ├── representations│    │  ├── CRUDBase (generic)              │
│  ├── descriptors    │    │  ├── CompoundDBService               │
│  ├── fingerprints   │    │  ├── FragmentDBService               │
│  ├── fragmentation  │    │  ├── AssemblyDBService               │
│  ├── assembly       │    │  └── ConformerDBService              │
│  ├── conformers     │    │                                      │
│  └── filters        │    │  chemlib/db/session.py               │
└─────────────────────┘    │  ├── engine + session factory        │
                           │  └── get_db() dependency             │
                           └──────────────┬──────────────────────┘
                           ┌─────────────────────────────────────┐
                           │          ORM MODELS                  │
                           │  chemlib/models/                     │
                           │  ├── base.py (Base, mixins)          │
                           │  ├── compound.py                     │
                           │  ├── structure.py                    │
                           │  ├── reaction.py                     │
                           │  └── assembly.py                     │
                           └──────────────┬──────────────────────┘
                           ┌─────────────────────────────────────┐
                           │          DATABASE                    │
                           │  PostgreSQL (prod) / SQLite (dev)    │
                           │  Schema managed by Alembic           │
                           └─────────────────────────────────────┘

Module Responsibilities

chemlib/chemistry/ — Pure Chemistry Utilities

These modules contain zero database access. They take RDKit Mol objects or SMILES strings as input and return computed results. This makes them independently testable.

representations.py

  • smiles_to_mol(smiles: str) -> Mol — Parse SMILES, return Mol or raise
  • mol_to_canonical_smiles(mol: Mol) -> str — Canonical SMILES
  • mol_to_inchi(mol: Mol) -> str — InChI string
  • mol_to_inchi_key(mol: Mol) -> str — 27-char InChIKey
  • mol_to_mol_block_2d(mol: Mol) -> str — MOL block with computed 2D coords
  • mol_to_mol_block_3d(mol: Mol) -> str — MOL block with 3D coords (single conformer)
  • mol_block_to_mol(block: str) -> Mol — Parse MOL block
  • sdf_to_mols(sdf_data: str) -> list[Mol] — Parse SDF
  • mol_to_formula(mol: Mol) -> str — Molecular formula

descriptors.py

  • compute_properties(mol: Mol) -> dict — All standard properties (MW, LogP, TPSA, HBD, HBA, rotatable bonds, rings, QED)
  • compute_mw(mol: Mol) -> float
  • compute_logp(mol: Mol) -> float
  • Individual property functions for targeted computation

fingerprints.py

  • compute_morgan_fp(mol: Mol, radius=2, nbits=2048) -> ExplicitBitVect
  • serialize_fp(fp: ExplicitBitVect) -> bytes — For DB storage
  • deserialize_fp(data: bytes) -> ExplicitBitVect — From DB
  • tanimoto_similarity(fp1, fp2) -> float
  • bulk_tanimoto(query_fp, fp_list) -> list[float]

fragmentation.py

  • brics_decompose(mol: Mol) -> list[str] — BRICS fragment SMILES
  • parse_attachment_points(frag_smiles: str) -> list[int] — Extract dummy atom labels
  • get_compatible_labels(label: int) -> list[int] — BRICS compatibility rules

assembly.py

  • join_fragments(frag1_smiles: str, frag2_smiles: str) -> list[str] — Join at compatible points, return product SMILES
  • brics_build(fragments: list[str], max_results=100) -> list[str] — Combinatorial assembly
  • validate_molecule(smiles: str) -> bool — Chemical sanity check
  • clean_assembled_mol(mol: Mol) -> Mol — Remove leftover dummy atoms, sanitize

conformers.py

  • generate_conformers(mol: Mol, num_confs=50, seed=42) -> Mol — ETKDGv3 embedding
  • minimize_conformer(mol: Mol, conf_id: int, force_field='MMFF94') -> tuple[float, bool] — Returns (energy, converged)
  • minimize_all_conformers(mol: Mol, force_field='MMFF94') -> list[tuple[int, float, bool]]
  • get_lowest_energy_conformer(mol: Mol) -> int — Conformer ID
  • conformer_to_mol_block(mol: Mol, conf_id: int) -> str — Extract single conformer as MOL block

filters.py

  • check_lipinski(mol: Mol) -> dict — {passes: bool, violations: list, properties: dict}
  • check_veber(mol: Mol) -> dict
  • check_pains(mol: Mol) -> dict — {passes: bool, matched_filters: list}
  • compute_qed(mol: Mol) -> float
  • compute_sa_score(mol: Mol) -> float
  • full_druglikeness_report(mol: Mol) -> dict — All filters combined

chemlib/services/ — Business Logic

Services orchestrate between chemistry operations and DB persistence.

compound_service.py

class CompoundService:
    async def import_from_smiles(self, db, smiles, name=None) -> CompoundResponse
        # 1. Parse SMILES (chemistry.representations)
        # 2. Check for duplicates (db_service.get_by_smiles)
        # 3. Compute properties (chemistry.descriptors)
        # 4. Compute fingerprint (chemistry.fingerprints)
        # 5. Generate 2D coords (chemistry.representations)
        # 6. Persist (db_service.create)

    async def import_from_sdf(self, db, sdf_data) -> list[CompoundResponse]
        # Process each molecule in SDF

    async def search_similar(self, db, smiles, threshold=0.7) -> list[CompoundResponse]
        # 1. Parse query SMILES, compute FP
        # 2. Get all FPs from DB
        # 3. Compute Tanimoto similarities
        # 4. Return matches above threshold

fragment_service.py

class FragmentService:
    async def decompose_compound(self, db, compound_id) -> list[FragmentResponse]
        # 1. Get compound from DB
        # 2. BRICS decompose (chemistry.fragmentation)
        # 3. For each fragment: compute properties, store

    async def get_compatible(self, db, fragment_id) -> list[FragmentResponse]
        # 1. Get fragment, read attachment points
        # 2. Compute compatible labels
        # 3. Query DB for fragments with matching labels

assembly_service.py

class AssemblyService:
    async def start_assembly(self, db, fragment_id) -> AssemblyResponse
        # 1. Create AssembledMolecule from initial fragment
        # 2. Record first AssemblyStep

    async def add_fragment(self, db, assembly_id, fragment_id, attachment_info) -> AssemblyResponse
        # 1. Get current molecule state
        # 2. Join fragment (chemistry.assembly)
        # 3. Validate result
        # 4. Update molecule, record AssemblyStep

    async def finalize(self, db, assembly_id) -> AssemblyResponse
        # 1. Compute all properties on final molecule
        # 2. Run all scoring
        # 3. Generate conformers
        # 4. Update DB record

conformer_service.py

class ConformerService:
    async def generate_and_minimize(self, db, parent_type, parent_id, num_confs=50) -> list[ConformerResponse]
        # 1. Get molecule SMILES from DB
        # 2. Generate conformers (chemistry.conformers)
        # 3. Minimize all (chemistry.conformers)
        # 4. Store each conformer in DB
        # 5. Mark lowest energy

    async def get_viewer_data(self, db, parent_type, parent_id, conf_id=None) -> str
        # Return MOL block for 3Dmol.js rendering

Click diagram to zoom and pan:

API Route Groups Overview

chemlib/api/ — FastAPI Routes

Routes are thin wrappers. They: 1. Parse and validate request data (Pydantic schemas, auto) 2. Get DB session (dependency injection) 3. Call the appropriate service method 4. Return the response

# Example: chemlib/api/compounds.py
router = APIRouter(prefix="/api/compounds", tags=["compounds"])

@router.post("/", response_model=CompoundResponse, status_code=201)
async def create_compound(
    data: CompoundCreate,
    db: AsyncSession = Depends(get_db),
    service: CompoundService = Depends(),
):
    return await service.import_from_smiles(db, data.smiles, data.name)

@router.get("/{compound_id}", response_model=CompoundResponse)
async def get_compound(
    compound_id: int,
    db: AsyncSession = Depends(get_db),
    service: CompoundService = Depends(),
):
    compound = await service.get(db, compound_id)
    if not compound:
        raise HTTPException(404, "Compound not found")
    return compound

chemlib/schemas/ — Pydantic Models

Separate from ORM models. Used for request validation and response serialization.

# chemlib/schemas/compound.py
class CompoundCreate(BaseModel):
    smiles: str
    name: str | None = None

class CompoundResponse(BaseModel):
    id: int
    name: str | None
    canonical_smiles: str
    inchi_key: str | None
    formula: str | None
    mw: float | None
    logp: float | None
    tpsa: float | None
    hbd: int | None
    hba: int | None
    qed_score: float | None
    sa_score: float | None
    lipinski_pass: bool | None
    created_at: datetime

    model_config = ConfigDict(from_attributes=True)

class CompoundFilter(BaseModel):
    mw_min: float | None = None
    mw_max: float | None = None
    logp_min: float | None = None
    logp_max: float | None = None
    lipinski_pass: bool | None = None
    limit: int = 100
    offset: int = 0

Configuration

# chemlib/config.py
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    DATABASE_URL: str = "sqlite+aiosqlite:///./chemlib.db"
    # For PostgreSQL: "postgresql+asyncpg://user:pass@localhost/chemlib"

    CONFORMER_COUNT: int = 50
    CONFORMER_FORCE_FIELD: str = "MMFF94"
    MORGAN_FP_RADIUS: int = 2
    MORGAN_FP_BITS: int = 2048
    SIMILARITY_THRESHOLD: float = 0.7

    class Config:
        env_file = ".env"

settings = Settings()

Error Handling

Custom exception classes mapped to HTTP status codes:

class ChemLibError(Exception): ...
class InvalidSMILESError(ChemLibError): ...     # → 422
class CompoundNotFoundError(ChemLibError): ...   # → 404
class DuplicateCompoundError(ChemLibError): ...  # → 409
class AssemblyError(ChemLibError): ...           # → 400
class ConformerError(ChemLibError): ...          # → 500

FastAPI exception handlers translate these to proper HTTP responses with error details.


Testing Strategy

Layer Approach Dependencies
Chemistry Unit tests with known molecules RDKit only
DB Service Integration tests In-memory SQLite
Services Integration tests SQLite + RDKit
API E2E tests with httpx.AsyncClient Full stack, SQLite

Use pytest-asyncio for all async tests. Test DB uses a fresh in-memory SQLite per test session.