Files
deardiary/todo/search.md
lotherk 0bdd71a4ed feat: v0.1.0 - geolocation capture, calendar, search, Starlight docs site
- Automatic browser geolocation capture on event creation
- Reverse geocoding via Nominatim API for place names
- Full-text search with SQLite FTS5
- Calendar view for browsing past entries
- DateNavigator component for day navigation
- SearchModal with Ctrl+K shortcut
- QuickAddWidget with Ctrl+J shortcut
- Starlight documentation site with GitHub Pages deployment
- Multiple AI provider support (Groq, OpenAI, Anthropic, Ollama, LM Studio)
- Multi-user registration support

BREAKING: Events now include latitude/longitude/placeName fields
2026-03-27 02:27:55 +00:00

8.9 KiB

Full-Text Search Feature Research

Overview

This document outlines research and implementation ideas for adding full-text search to DearDiary, an AI-powered daily journaling app.


1. Feature Description

Core Functionality

  • Search across diary content (journal titles and bodies)
  • Search across raw event content
  • Filter by date range
  • Sort by relevance (BM25) or date
  • Real-time instant search as user types

User Stories

  1. Quick Recall: User types "meeting with Sarah" → sees matching diary entries and events
  2. Date-based Search: User searches "vacation" → filters to summer 2024 entries
  3. Deep Search: User searches for specific phrase → finds exact match in event content

2. Technical Approach

Pros:

  • Zero external dependencies
  • Built into SQLite (already in use)
  • BM25 ranking built-in
  • Real-time indexing (update on insert)
  • Lowest implementation complexity
  • No additional infrastructure

Cons:

  • No typo tolerance (unless using trigram/token helpers)
  • Limited to SQLite (migration cost if switching DB)
  • Single-node only (fine for self-hosted)

Implementation:

-- FTS5 virtual table for journals
CREATE VIRTUAL TABLE journal_fts USING fts5(
  title,
  content,
  content_rowid='rowid',
  tokenize='porter unicode61'
);

-- FTS5 virtual table for events  
CREATE VIRTUAL TABLE event_fts USING fts5(
  content,
  type,
  content_rowid='rowid',
  tokenize='porter unicode61'
);

-- Triggers to keep FTS in sync
CREATE TRIGGER journal_ai AFTER INSERT ON Journal BEGIN
  INSERT INTO journal_fts(rowid, title, content) 
  VALUES (NEW.rowid, NEW.title, NEW.content);
END;

Performance: FTS5 handles 100k+ rows easily on SQLite. For typical personal journaling (10 years = ~3650 entries, ~10k events), performance will be sub-100ms.

Option B: External Search (Typesense/Meilisearch)

Pros:

  • Typo tolerance (fuzzy search)
  • Better ranking algorithms
  • Scalable to millions of records
  • REST API, language-agnostic

Cons:

  • Additional infrastructure (Docker service)
  • Sync complexity (real-time indexing)
  • More complex setup for self-hosted users
  • Resource overhead (CPU/RAM for search service)

Recommendation: Defer to v2. External search only becomes necessary when:

  • User wants fuzzy/typo-tolerant search
  • Dataset exceeds 500k+ records
  • Multi-language support needed

3. Indexing Strategy

Fields to Index

Table Field Indexed Reason
Journal title Yes Primary search target
Journal content Yes Full diary text
Journal date Yes Filtering
Event content Yes Raw event text
Event type Yes Filter by event type
Event date Yes Date filtering

What NOT to Index

  • Event.metadata - JSON blob, search within JSON handled separately if needed
  • Event.mediaPath - File paths, not searchable content
  • User fields - Not needed for user-facing search

Sync Strategy

  1. On Insert/Update: Write to main table, then update FTS via trigger
  2. On Delete: FTS trigger removes from index
  3. Reindex: Manual endpoint for recovery/debugging

4. Database Schema Changes

Prisma Schema Addition

// Optional: Search history for "recent searches" feature
model SearchHistory {
  id        String   @id @default(uuid())
  userId    String
  query     String
  createdAt DateTime @default(now())

  @@index([userId, createdAt])
}

Note: FTS5 tables are virtual and managed via raw SQL, not Prisma models. We'll use prisma.$executeRaw for FTS operations.

Migration Steps

  1. Create FTS5 virtual tables (raw SQL)
  2. Create triggers for auto-sync
  3. Backfill existing data
  4. Add SearchHistory model (optional)

5. API Changes

New Endpoints

GET /api/v1/search?q=<query>&type=diary|event|all&from=2024-01-01&to=2024-12-31&sort=relevance|date&page=1&limit=20

Response:

interface SearchResult {
  type: 'diary' | 'event';
  id: string;
  date: string;
  title?: string;        // For diaries
  content: string;      // Truncated/preview
  highlight?: string;  // Matched text with <mark> tags
  score: number;        // BM25 relevance
}

interface SearchResponse {
  data: {
    results: SearchResult[];
    total: number;
    page: number;
    limit: number;
  } | null;
  error: null;
}

Optional Endpoints

GET /api/v1/search/history    // Recent searches
DELETE /api/v1/search/history // Clear history
POST /api/v1/search/reindex   // Force reindex (admin)

6. UI/UX Considerations

Search Modal

  • Trigger: Cmd/Ctrl+K keyboard shortcut (standard pattern)
  • Position: Centered modal with overlay
  • Features:
    • Instant search as you type (debounced 150ms)
    • Filter tabs: All | Diaries | Events
    • Date range picker (quick presets: Today, This Week, This Month, This Year)
    • Results show date, type, preview with highlighted matches

Sidebar (Alternative)

  • Persistent search box in navigation
  • Results in scrollable list below
  • Less intrusive, always visible

Result Cards

┌─────────────────────────────────────────┐
│ 📅 2024-03-15                    [Diary] │
│ Meeting with Sarah about project...    │
│ ─────────────────────────────────────── │
│ ...discussed timeline and <mark>budget</mark>... │
└─────────────────────────────────────────┘

UX Details

  • Empty state: Show recent diaries/events when no query
  • No results: Friendly message + suggestions
  • Loading: Subtle spinner (search should be <100ms)
  • Keyboard: Arrow keys to navigate results, Enter to open

Mobile Considerations

  • Tap search icon in header → full-screen search
  • Larger touch targets for filters

7. Performance Considerations

Query Performance

  • FTS5 BM25 queries: ~50-100ms for 10k records
  • Add LIMIT to prevent unbounded results
  • Use connection pooling if many concurrent searches

Write Performance

  • Triggers add ~5-10ms per insert/update
  • Batch backfill for existing data (1000 rows/batch)

Caching Strategy

  • Cache recent searches (Redis optional, or in-memory)
  • Cache FTS index in memory (SQLite mmap)

Scaling Thresholds

  • < 10k entries: No optimization needed
  • 10k-100k: Consider FTS5 optimization (tokenizer, prefix search)
  • 100k: Consider external search


8. Implementation Complexity

Complexity Assessment: MEDIUM

Component Complexity Notes
FTS5 setup Low Raw SQL, one-time
Triggers Low Auto-sync, minimal code
API endpoint Low Standard CRUD pattern
Frontend modal Medium Keyboard shortcuts, state
Filters/Date Medium Multiple filter combinations
Backfill Low One-time script

Phased Implementation

Phase 1 (MVP - 2-3 days)

  • FTS5 tables + triggers
  • Basic search API
  • Simple modal UI with text input

Phase 2 (Enhancements - 1-2 days)

  • Date filtering
  • Type filtering (diary/event)
  • Result highlighting

Phase 3 (Polish - 1 day)

  • Search history
  • Keyboard navigation
  • Mobile responsive

9. Priority Recommendation

Rationale:

  • Search is a core journaling feature (user wants to find past entries)
  • Competitor apps (Day One, Journey) have robust search
  • Implementation complexity is manageable (medium)
  • Zero external dependencies (SQLite FTS5)

Factors Supporting High Priority

  1. User Value: High - helps users find meaningful memories
  2. Implementation Cost: Medium - achievable in 1 week
  3. Dependency Risk: Low - no external services needed
  4. Future-proofing: FTS5 is mature, well-supported

Factors Against Very High Priority

  • Current core features (capture, generate) are stable
  • Small dataset users may not notice missing search
  • Can be added post-MVP without breaking changes

10. Open Questions / Further Research

  1. Typo tolerance: Is exact match sufficient, or do users expect fuzzy search?
  2. Search ranking: Should recent results be boosted higher?
  3. Multi-language: Support languages other than English (tokenizer considerations)
  4. Export/Import: Should search index be rebuilt on data import?
  5. Shared access: Multi-user search (future consideration)

11. Summary

Aspect Recommendation
Search Engine SQLite FTS5 (built-in)
UI Pattern Cmd/Ctrl+K modal
Features Instant search, date filter, type filter, relevance sort
Complexity Medium (3-5 days)
Priority Medium-High
Schema Changes FTS5 via raw SQL + optional SearchHistory model
API Changes New /search endpoint with query params