Architecture Overview
Diffy is built as a monorepo with two deployable applications and several shared packages.
System Architecture
┌─────────────────────────────────────────────────────────────────────────┐
│ User Browsers │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ CDN / Edge │
│ (Vercel) │
└─────────────────────────────────────────────────────────────────────────┘
│
┌───────────────────────┴───────────────────────┐
▼ ▼
┌───────────────────────────┐ ┌───────────────────────────┐
│ Web Application │ │ Public API │
│ (Next.js on Vercel) │ │ (Hono on Railway) │
│ │ │ │
│ - Dashboard UI │ │ - REST endpoints │
│ - Internal API routes │ │ - External developers │
│ - Cron triggers │ │ - API keys │
│ - Slack OAuth │ │ │
└───────────┬───────────────┘ └───────────┬───────────────┘
│ │
└─────────────────┬─────────────────────────┘
│
▼
┌───────────────────────┐
│ PostgreSQL (Neon) │
│ │
│ - Domains │
│ - Pages │
│ - Snapshots │
│ - Changes │
│ - Users/Orgs │
└───────────────────────┘
Applications
Web (apps/web)
The main user-facing application built with Next.js 14 (App Router).
Responsibilities:
- Dashboard UI for viewing domains, changes, and alerts
- Internal API routes (
/api/v1/*) for dashboard operations - Cron job triggers (
/api/cron/*) - OAuth flows (Clerk, Slack)
Deployment: Vercel
API (apps/api)
Public REST API for external developers, built with Hono.
Responsibilities:
- External API access
- API key authentication
- Rate limiting per API key
Deployment: Railway
Packages
| Package | Purpose |
|---|---|
@diffy/db | Drizzle ORM schema and database access |
@diffy/crawler | Web crawling and diff detection logic |
@diffy/alerts | Notification system (email, Slack) |
@diffy/core | Shared business logic |
@diffy/ui | Shared React components |
Data Flow
Adding a Domain
1. User submits domain → Web API
2. Insert domain (status: pending) → Database
3. Enqueue discovery job → Redis Queue
4. Worker picks up job → Crawler Worker
5. Discover pages → Playwright
6. Insert pages → Database
7. Update domain (status: active) → Database
Scheduled Crawling
1. Cron trigger fires → Vercel Cron
2. Mark domains needing crawl → Database
3. Worker polls for work → Crawler Worker
4. Crawl each page → Playwright
5. Compare content hash → Worker
6. If changed:
a. Save new snapshot → Database
b. Record change → Database
c. Send alerts → Email/Slack
7. Update last crawl time → Database
Security Model
Authentication
- Users: Clerk handles user auth (OAuth, email/password)
- Organizations: Multi-tenant with org-based data isolation
- API: Session cookies for web, API keys for external
Authorization
- Users can only access their organization's data
- All database queries filter by
orgId - Row-level security enforced at application layer