Skip to content

Implementation Plan — ChemLib

Overview

The project is divided into 6 phases, each building on the previous. Each phase produces a working, testable increment.


Phase 1: Foundation (Database + Models + Config)

Goal: Set up the project skeleton, database, ORM models, and Alembic migrations.

Tasks

  1. Project scaffolding
  2. Create directory structure as defined in CLAUDE.md
  3. Initialize pyproject.toml with dependencies
  4. Create requirements.txt
  5. Initialize git repository

  6. Configuration

  7. chemlib/config.py — Pydantic settings with DATABASE_URL, constants
  8. .env.example with default SQLite configuration

  9. SQLAlchemy models

  10. chemlib/models/base.pyBase, TimestampMixin
  11. chemlib/models/compound.pyCompound, Fragment, CompoundFragment
  12. chemlib/models/assembly.pyAssembledMolecule, AssemblyStep
  13. chemlib/models/structure.pyConformer
  14. chemlib/models/reaction.pyReactionTemplate
  15. All relationships and indexes as specified in DATABASE_DESIGN.md

  16. Alembic setup

  17. alembic init alembic
  18. Configure alembic/env.py for async SQLAlchemy
  19. Set render_as_batch=True for SQLite compatibility
  20. Generate and apply initial migration

  21. Database session management

  22. chemlib/db/session.py — async engine, session factory, get_db() dependency

  23. DB service layer

  24. chemlib/db/service.pyCRUDBase generic + specialized services
  25. CompoundDBService, FragmentDBService, AssemblyDBService, ConformerDBService

  26. Tests

  27. tests/conftest.py — async test fixtures, in-memory SQLite
  28. tests/test_models/test_compound.py — model creation, relationships
  29. tests/test_db/test_service.py — CRUD operations

Deliverable

  • Running database with all tables created via Alembic
  • CRUD operations verified by tests
  • No API or UI yet

Phase 2: Chemistry Engine

Goal: Implement all RDKit-based chemistry utilities as standalone, testable modules.

Tasks

  1. Representations (chemlib/chemistry/representations.py)
  2. SMILES ↔ Mol ↔ canonical SMILES
  3. InChI / InChIKey generation
  4. MOL block generation (2D and 3D)
  5. SDF parsing
  6. Molecular formula

  7. Descriptors (chemlib/chemistry/descriptors.py)

  8. compute_properties() — MW, LogP, TPSA, HBD, HBA, rotatable bonds, rings, QED
  9. Individual property functions

  10. Fingerprints (chemlib/chemistry/fingerprints.py)

  11. Morgan fingerprint generation (ECFP4)
  12. Serialization/deserialization for DB storage
  13. Tanimoto similarity (single and bulk)

  14. Fragmentation (chemlib/chemistry/fragmentation.py)

  15. BRICS decomposition
  16. Attachment point parsing
  17. Compatibility rules (BRICS_COMPATIBLE dict)

  18. Assembly (chemlib/chemistry/assembly.py)

  19. Fragment joining via BRICS rules
  20. Molecule validation
  21. Dummy atom cleanup
  22. Combinatorial BRICS build

  23. Conformers (chemlib/chemistry/conformers.py)

  24. Conformer generation (ETKDGv3)
  25. MMFF94 / UFF minimization
  26. Lowest energy selection
  27. MOL block extraction per conformer

  28. Filters (chemlib/chemistry/filters.py)

  29. Lipinski Rule of Five
  30. Veber rules
  31. PAINS filter
  32. QED score
  33. SA Score (with vendor/sascorer.py setup)
  34. Full drug-likeness report

  35. Tests

  36. tests/test_chemistry/test_representations.py — known molecule SMILES ↔ conversions
  37. tests/test_chemistry/test_descriptors.py — property values against known molecules
  38. tests/test_chemistry/test_fragmentation.py — BRICS output for known molecules
  39. tests/test_chemistry/test_assembly.py — joining known fragment pairs
  40. tests/test_chemistry/test_conformers.py — conformer generation, minimization
  41. tests/test_chemistry/test_filters.py — Lipinski pass/fail for known molecules

Deliverable

  • Complete chemistry utility library
  • All chemistry functions tested independently of DB
  • Can run: python -c "from chemlib.chemistry import ..."

Phase 3: Services + API (Compounds & Fragments)

Goal: Build the service layer and API endpoints for compound and fragment management.

Tasks

  1. FastAPI app setup
  2. chemlib/main.py — app factory, router registration, exception handlers
  3. chemlib/api/deps.py — shared dependencies (get_db, get_service)

  4. Pydantic schemas

  5. chemlib/schemas/compound.py — CompoundCreate, CompoundResponse, CompoundFilter
  6. chemlib/schemas/fragment.py — FragmentResponse, DecompositionResponse

  7. Compound service (chemlib/services/compound_service.py)

  8. import_from_smiles() — parse, validate, compute properties, store
  9. import_from_sdf() — batch import
  10. search_similar() — fingerprint similarity search
  11. search_substructure() — SMARTS substructure search

  12. Fragment service (chemlib/services/fragment_service.py)

  13. decompose_compound() — BRICS decompose, store fragments
  14. get_compatible() — find fragments with matching attachment points

  15. Compound API routes (chemlib/api/compounds.py)

  16. All endpoints from API_DESIGN.md compounds section

  17. Fragment API routes (chemlib/api/fragments.py)

  18. All endpoints from API_DESIGN.md fragments section

  19. Error handling

  20. Custom exception classes
  21. FastAPI exception handlers mapping to HTTP status codes

  22. Tests

  23. tests/test_api/test_compounds.py — endpoint tests with httpx.AsyncClient
  24. tests/test_api/test_fragments.py — decomposition and compatibility tests
  25. tests/test_services/test_compound_service.py — service integration tests

Deliverable

  • Working API: can create compounds, decompose into fragments, search
  • http://localhost:8000/docs shows all endpoints
  • All endpoints tested

Phase 4: Assembly System

Goal: Implement the molecule assembly pipeline — the core innovation of the system.

Tasks

  1. Assembly Pydantic schemas
  2. chemlib/schemas/assembly.py — AssemblyCreate, AddFragmentRequest, AssemblyResponse, FinalizeRequest

  3. Assembly service (chemlib/services/assembly_service.py)

  4. start_assembly() — create from initial fragment
  5. add_fragment() — join fragment, validate, record step
  6. get_available_attachment_points() — what's open on current molecule
  7. finalize() — clean molecule, compute properties, score

  8. Assembly API routes (chemlib/api/assembly.py)

  9. All assembly endpoints from API_DESIGN.md

  10. Scoring service (chemlib/services/scoring_service.py)

  11. score_molecule() — full drug-likeness report
  12. evaluate_smiles() — score without storing

  13. Scoring API routes (chemlib/api/scoring.py)

  14. Scoring endpoints from API_DESIGN.md

  15. Tests

  16. tests/test_api/test_assembly.py — full assembly workflow tests
  17. tests/test_services/test_assembly_service.py — service tests
  18. tests/test_api/test_scoring.py — scoring endpoint tests

Deliverable

  • End-to-end assembly workflow via API
  • Can build a molecule from fragments, score it, get drug-likeness report
  • All assembly and scoring endpoints tested

Phase 5: 3D Visualization & Energy Minimization

Goal: Add conformer generation, energy minimization, and the 3D viewer.

Tasks

  1. Conformer service (chemlib/services/conformer_service.py)
  2. generate_and_minimize() — full pipeline: embed → minimize → store
  3. get_viewer_data() — MOL block for 3Dmol.js
  4. get_conformer_list() — all conformers with energies

  5. Visualization service (chemlib/services/viz_service.py)

  6. get_2d_svg() — SVG depiction
  7. get_3d_mol_block() — 3D coordinates for viewer

  8. Visualization API routes (chemlib/api/visualization.py)

  9. All viz endpoints from API_DESIGN.md

  10. 3Dmol.js integration

  11. chemlib/static/js/viewer.js — MolViewer class
  12. chemlib/static/js/conformer_browser.js — ConformerBrowser class
  13. Viewer template page with controls

  14. Tests

  15. tests/test_services/test_conformer_service.py — generation and minimization
  16. tests/test_api/test_visualization.py — endpoint tests

Deliverable

  • 3D viewer working in browser
  • Conformer generation and energy minimization via API
  • Can rotate, zoom, and browse conformers

Phase 6: UI & Integration

Goal: Build the web UI and wire everything together.

Tasks

  1. Base template (chemlib/templates/base.html)
  2. Navigation bar, footer, CDN imports (Bootstrap, 3Dmol.js)
  3. Common CSS and JS

  4. Dashboard (chemlib/templates/index.html)

  5. Summary stats, quick actions

  6. Compound browser (chemlib/templates/compound_browser.html)

  7. Filterable table, 2D depictions, pagination

  8. Compound detail (chemlib/templates/compound_detail.html)

  9. Properties card, scorecard, fragment list, 3D viewer link

  10. Fragment browser (chemlib/templates/fragment_browser.html)

  11. Grid view with attachment point badges

  12. Assembly workspace (chemlib/templates/assembly_workspace.html)

  13. Split-pane layout: current molecule + available fragments
  14. Step-by-step assembly with live preview
  15. Finalize button

  16. 3D viewer page (chemlib/templates/viewer_3d.html)

  17. Full-page 3Dmol.js canvas with controls sidebar

  18. Scoring report (chemlib/templates/scoring_report.html)

  19. Visual scorecard with gauges and indicators

  20. UI page routes (in chemlib/main.py)

  21. GET routes that serve templates

  22. Seed script (scripts/seed_fragments.py)

    • Populate DB with a starter set of common fragments
    • Include 20-30 diverse, drug-like building blocks
  23. Integration testing

    • Full workflow: import compound → decompose → assemble → score → view 3D

Deliverable

  • Complete working web application
  • All pages functional and connected
  • Starter fragment library seeded
  • Full user workflow testable end-to-end

Phase Summary

Phase Name Key Output Depends On
1 Foundation DB + Models + CRUD
2 Chemistry Engine RDKit utilities
3 Services + API (Compounds) REST API for compounds/fragments 1, 2
4 Assembly System Fragment joining + scoring 3
5 3D Visualization Viewer + conformers 3, 4
6 UI & Integration Web interface 3, 4, 5

Note: Phases 1 and 2 can be developed in parallel as they have no mutual dependencies.


Definition of Done (per phase)

  • All code written and follows project structure
  • All tests pass
  • Alembic migrations apply cleanly
  • No linting errors
  • Documentation updated if interfaces changed
  • Verified against design documents