System Architecture — ChemLib¶
Layer Architecture¶
Click diagram to zoom and pan:
ChemLib follows a strict layered architecture with unidirectional dependencies:
┌─────────────────────────────────────────────────────────────────┐
│ UI LAYER │
│ HTML templates + JavaScript + 3Dmol.js │
│ Only communicates via HTTP to API layer │
└──────────────────────────┬──────────────────────────────────────┘
│ HTTP (REST JSON)
▼
┌─────────────────────────────────────────────────────────────────┐
│ API LAYER (FastAPI) │
│ Routes: parse requests, validate input (Pydantic), call │
│ services, format responses. NO business logic here. │
│ │
│ chemlib/api/ │
│ ├── compounds.py POST/GET/PUT/DELETE /api/compounds │
│ ├── fragments.py POST/GET /api/fragments │
│ ├── assembly.py POST /api/assembly/* │
│ ├── visualization.py GET /api/viz/* │
│ └── scoring.py GET/POST /api/scoring/* │
└──────────────────────────┬──────────────────────────────────────┘
│ Python function calls
▼
┌─────────────────────────────────────────────────────────────────┐
│ SERVICE LAYER │
│ Business logic. Orchestrates chemistry + DB operations. │
│ Services are stateless — all state in DB. │
│ │
│ chemlib/services/ │
│ ├── compound_service.py Import, validate, compute properties │
│ ├── fragment_service.py Decompose, manage fragment library │
│ ├── assembly_service.py Join fragments, build molecules │
│ ├── conformer_service.py Generate 3D, minimize energy │
│ ├── scoring_service.py Drug-likeness, SA, filters │
│ └── viz_service.py Prepare 3D viewer data │
└──────────┬──────────────────────────────┬───────────────────────┘
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────────────────────┐
│ CHEMISTRY LAYER │ │ DB SERVICE LAYER │
│ Pure RDKit ops │ │ SQLAlchemy CRUD operations │
│ No DB access │ │ Only layer that touches ORM │
│ │ │ │
│ chemlib/chemistry/ │ │ chemlib/db/service.py │
│ ├── representations│ │ ├── CRUDBase (generic) │
│ ├── descriptors │ │ ├── CompoundDBService │
│ ├── fingerprints │ │ ├── FragmentDBService │
│ ├── fragmentation │ │ ├── AssemblyDBService │
│ ├── assembly │ │ └── ConformerDBService │
│ ├── conformers │ │ │
│ └── filters │ │ chemlib/db/session.py │
└─────────────────────┘ │ ├── engine + session factory │
│ └── get_db() dependency │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ ORM MODELS │
│ chemlib/models/ │
│ ├── base.py (Base, mixins) │
│ ├── compound.py │
│ ├── structure.py │
│ ├── reaction.py │
│ └── assembly.py │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ DATABASE │
│ PostgreSQL (prod) / SQLite (dev) │
│ Schema managed by Alembic │
└─────────────────────────────────────┘
Module Responsibilities¶
chemlib/chemistry/ — Pure Chemistry Utilities¶
These modules contain zero database access. They take RDKit Mol objects or SMILES strings as input and return computed results. This makes them independently testable.
representations.py¶
smiles_to_mol(smiles: str) -> Mol— Parse SMILES, return Mol or raisemol_to_canonical_smiles(mol: Mol) -> str— Canonical SMILESmol_to_inchi(mol: Mol) -> str— InChI stringmol_to_inchi_key(mol: Mol) -> str— 27-char InChIKeymol_to_mol_block_2d(mol: Mol) -> str— MOL block with computed 2D coordsmol_to_mol_block_3d(mol: Mol) -> str— MOL block with 3D coords (single conformer)mol_block_to_mol(block: str) -> Mol— Parse MOL blocksdf_to_mols(sdf_data: str) -> list[Mol]— Parse SDFmol_to_formula(mol: Mol) -> str— Molecular formula
descriptors.py¶
compute_properties(mol: Mol) -> dict— All standard properties (MW, LogP, TPSA, HBD, HBA, rotatable bonds, rings, QED)compute_mw(mol: Mol) -> floatcompute_logp(mol: Mol) -> float- Individual property functions for targeted computation
fingerprints.py¶
compute_morgan_fp(mol: Mol, radius=2, nbits=2048) -> ExplicitBitVectserialize_fp(fp: ExplicitBitVect) -> bytes— For DB storagedeserialize_fp(data: bytes) -> ExplicitBitVect— From DBtanimoto_similarity(fp1, fp2) -> floatbulk_tanimoto(query_fp, fp_list) -> list[float]
fragmentation.py¶
brics_decompose(mol: Mol) -> list[str]— BRICS fragment SMILESparse_attachment_points(frag_smiles: str) -> list[int]— Extract dummy atom labelsget_compatible_labels(label: int) -> list[int]— BRICS compatibility rules
assembly.py¶
join_fragments(frag1_smiles: str, frag2_smiles: str) -> list[str]— Join at compatible points, return product SMILESbrics_build(fragments: list[str], max_results=100) -> list[str]— Combinatorial assemblyvalidate_molecule(smiles: str) -> bool— Chemical sanity checkclean_assembled_mol(mol: Mol) -> Mol— Remove leftover dummy atoms, sanitize
conformers.py¶
generate_conformers(mol: Mol, num_confs=50, seed=42) -> Mol— ETKDGv3 embeddingminimize_conformer(mol: Mol, conf_id: int, force_field='MMFF94') -> tuple[float, bool]— Returns (energy, converged)minimize_all_conformers(mol: Mol, force_field='MMFF94') -> list[tuple[int, float, bool]]get_lowest_energy_conformer(mol: Mol) -> int— Conformer IDconformer_to_mol_block(mol: Mol, conf_id: int) -> str— Extract single conformer as MOL block
filters.py¶
check_lipinski(mol: Mol) -> dict— {passes: bool, violations: list, properties: dict}check_veber(mol: Mol) -> dictcheck_pains(mol: Mol) -> dict— {passes: bool, matched_filters: list}compute_qed(mol: Mol) -> floatcompute_sa_score(mol: Mol) -> floatfull_druglikeness_report(mol: Mol) -> dict— All filters combined
chemlib/services/ — Business Logic¶
Services orchestrate between chemistry operations and DB persistence.
compound_service.py¶
class CompoundService:
async def import_from_smiles(self, db, smiles, name=None) -> CompoundResponse
# 1. Parse SMILES (chemistry.representations)
# 2. Check for duplicates (db_service.get_by_smiles)
# 3. Compute properties (chemistry.descriptors)
# 4. Compute fingerprint (chemistry.fingerprints)
# 5. Generate 2D coords (chemistry.representations)
# 6. Persist (db_service.create)
async def import_from_sdf(self, db, sdf_data) -> list[CompoundResponse]
# Process each molecule in SDF
async def search_similar(self, db, smiles, threshold=0.7) -> list[CompoundResponse]
# 1. Parse query SMILES, compute FP
# 2. Get all FPs from DB
# 3. Compute Tanimoto similarities
# 4. Return matches above threshold
fragment_service.py¶
class FragmentService:
async def decompose_compound(self, db, compound_id) -> list[FragmentResponse]
# 1. Get compound from DB
# 2. BRICS decompose (chemistry.fragmentation)
# 3. For each fragment: compute properties, store
async def get_compatible(self, db, fragment_id) -> list[FragmentResponse]
# 1. Get fragment, read attachment points
# 2. Compute compatible labels
# 3. Query DB for fragments with matching labels
assembly_service.py¶
class AssemblyService:
async def start_assembly(self, db, fragment_id) -> AssemblyResponse
# 1. Create AssembledMolecule from initial fragment
# 2. Record first AssemblyStep
async def add_fragment(self, db, assembly_id, fragment_id, attachment_info) -> AssemblyResponse
# 1. Get current molecule state
# 2. Join fragment (chemistry.assembly)
# 3. Validate result
# 4. Update molecule, record AssemblyStep
async def finalize(self, db, assembly_id) -> AssemblyResponse
# 1. Compute all properties on final molecule
# 2. Run all scoring
# 3. Generate conformers
# 4. Update DB record
conformer_service.py¶
class ConformerService:
async def generate_and_minimize(self, db, parent_type, parent_id, num_confs=50) -> list[ConformerResponse]
# 1. Get molecule SMILES from DB
# 2. Generate conformers (chemistry.conformers)
# 3. Minimize all (chemistry.conformers)
# 4. Store each conformer in DB
# 5. Mark lowest energy
async def get_viewer_data(self, db, parent_type, parent_id, conf_id=None) -> str
# Return MOL block for 3Dmol.js rendering
Click diagram to zoom and pan:
chemlib/api/ — FastAPI Routes¶
Routes are thin wrappers. They: 1. Parse and validate request data (Pydantic schemas, auto) 2. Get DB session (dependency injection) 3. Call the appropriate service method 4. Return the response
# Example: chemlib/api/compounds.py
router = APIRouter(prefix="/api/compounds", tags=["compounds"])
@router.post("/", response_model=CompoundResponse, status_code=201)
async def create_compound(
data: CompoundCreate,
db: AsyncSession = Depends(get_db),
service: CompoundService = Depends(),
):
return await service.import_from_smiles(db, data.smiles, data.name)
@router.get("/{compound_id}", response_model=CompoundResponse)
async def get_compound(
compound_id: int,
db: AsyncSession = Depends(get_db),
service: CompoundService = Depends(),
):
compound = await service.get(db, compound_id)
if not compound:
raise HTTPException(404, "Compound not found")
return compound
chemlib/schemas/ — Pydantic Models¶
Separate from ORM models. Used for request validation and response serialization.
# chemlib/schemas/compound.py
class CompoundCreate(BaseModel):
smiles: str
name: str | None = None
class CompoundResponse(BaseModel):
id: int
name: str | None
canonical_smiles: str
inchi_key: str | None
formula: str | None
mw: float | None
logp: float | None
tpsa: float | None
hbd: int | None
hba: int | None
qed_score: float | None
sa_score: float | None
lipinski_pass: bool | None
created_at: datetime
model_config = ConfigDict(from_attributes=True)
class CompoundFilter(BaseModel):
mw_min: float | None = None
mw_max: float | None = None
logp_min: float | None = None
logp_max: float | None = None
lipinski_pass: bool | None = None
limit: int = 100
offset: int = 0
Configuration¶
# chemlib/config.py
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
DATABASE_URL: str = "sqlite+aiosqlite:///./chemlib.db"
# For PostgreSQL: "postgresql+asyncpg://user:pass@localhost/chemlib"
CONFORMER_COUNT: int = 50
CONFORMER_FORCE_FIELD: str = "MMFF94"
MORGAN_FP_RADIUS: int = 2
MORGAN_FP_BITS: int = 2048
SIMILARITY_THRESHOLD: float = 0.7
class Config:
env_file = ".env"
settings = Settings()
Error Handling¶
Custom exception classes mapped to HTTP status codes:
class ChemLibError(Exception): ...
class InvalidSMILESError(ChemLibError): ... # → 422
class CompoundNotFoundError(ChemLibError): ... # → 404
class DuplicateCompoundError(ChemLibError): ... # → 409
class AssemblyError(ChemLibError): ... # → 400
class ConformerError(ChemLibError): ... # → 500
FastAPI exception handlers translate these to proper HTTP responses with error details.
Testing Strategy¶
| Layer | Approach | Dependencies |
|---|---|---|
| Chemistry | Unit tests with known molecules | RDKit only |
| DB Service | Integration tests | In-memory SQLite |
| Services | Integration tests | SQLite + RDKit |
| API | E2E tests with httpx.AsyncClient | Full stack, SQLite |
Use pytest-asyncio for all async tests. Test DB uses a fresh in-memory SQLite per test session.

