# Full-Text Search Feature Research ## Overview This document outlines research and implementation ideas for adding full-text search to DearDiary, an AI-powered daily journaling app. --- ## 1. Feature Description ### Core Functionality - Search across diary content (journal titles and bodies) - Search across raw event content - Filter by date range - Sort by relevance (BM25) or date - Real-time instant search as user types ### User Stories 1. **Quick Recall**: User types "meeting with Sarah" → sees matching diary entries and events 2. **Date-based Search**: User searches "vacation" → filters to summer 2024 entries 3. **Deep Search**: User searches for specific phrase → finds exact match in event content --- ## 2. Technical Approach ### Option A: SQLite FTS5 (Recommended for v1) **Pros:** - Zero external dependencies - Built into SQLite (already in use) - BM25 ranking built-in - Real-time indexing (update on insert) - Lowest implementation complexity - No additional infrastructure **Cons:** - No typo tolerance (unless using trigram/token helpers) - Limited to SQLite (migration cost if switching DB) - Single-node only (fine for self-hosted) **Implementation:** ```sql -- FTS5 virtual table for journals CREATE VIRTUAL TABLE journal_fts USING fts5( title, content, content_rowid='rowid', tokenize='porter unicode61' ); -- FTS5 virtual table for events CREATE VIRTUAL TABLE event_fts USING fts5( content, type, content_rowid='rowid', tokenize='porter unicode61' ); -- Triggers to keep FTS in sync CREATE TRIGGER journal_ai AFTER INSERT ON Journal BEGIN INSERT INTO journal_fts(rowid, title, content) VALUES (NEW.rowid, NEW.title, NEW.content); END; ``` **Performance:** FTS5 handles 100k+ rows easily on SQLite. For typical personal journaling (10 years = ~3650 entries, ~10k events), performance will be sub-100ms. ### Option B: External Search (Typesense/Meilisearch) **Pros:** - Typo tolerance (fuzzy search) - Better ranking algorithms - Scalable to millions of records - REST API, language-agnostic **Cons:** - Additional infrastructure (Docker service) - Sync complexity (real-time indexing) - More complex setup for self-hosted users - Resource overhead (CPU/RAM for search service) **Recommendation:** Defer to v2. External search only becomes necessary when: - User wants fuzzy/typo-tolerant search - Dataset exceeds 500k+ records - Multi-language support needed --- ## 3. Indexing Strategy ### Fields to Index | Table | Field | Indexed | Reason | |-------|-------|---------|--------| | Journal | title | Yes | Primary search target | | Journal | content | Yes | Full diary text | | Journal | date | Yes | Filtering | | Event | content | Yes | Raw event text | | Event | type | Yes | Filter by event type | | Event | date | Yes | Date filtering | ### What NOT to Index - `Event.metadata` - JSON blob, search within JSON handled separately if needed - `Event.mediaPath` - File paths, not searchable content - `User` fields - Not needed for user-facing search ### Sync Strategy 1. **On Insert/Update**: Write to main table, then update FTS via trigger 2. **On Delete**: FTS trigger removes from index 3. **Reindex**: Manual endpoint for recovery/debugging --- ## 4. Database Schema Changes ### Prisma Schema Addition ```prisma // Optional: Search history for "recent searches" feature model SearchHistory { id String @id @default(uuid()) userId String query String createdAt DateTime @default(now()) @@index([userId, createdAt]) } ``` Note: FTS5 tables are virtual and managed via raw SQL, not Prisma models. We'll use `prisma.$executeRaw` for FTS operations. ### Migration Steps 1. Create FTS5 virtual tables (raw SQL) 2. Create triggers for auto-sync 3. Backfill existing data 4. Add SearchHistory model (optional) --- ## 5. API Changes ### New Endpoints ``` GET /api/v1/search?q=&type=diary|event|all&from=2024-01-01&to=2024-12-31&sort=relevance|date&page=1&limit=20 ``` **Response:** ```typescript interface SearchResult { type: 'diary' | 'event'; id: string; date: string; title?: string; // For diaries content: string; // Truncated/preview highlight?: string; // Matched text with tags score: number; // BM25 relevance } interface SearchResponse { data: { results: SearchResult[]; total: number; page: number; limit: number; } | null; error: null; } ``` ### Optional Endpoints ``` GET /api/v1/search/history // Recent searches DELETE /api/v1/search/history // Clear history POST /api/v1/search/reindex // Force reindex (admin) ``` --- ## 6. UI/UX Considerations ### Search Modal - **Trigger**: Cmd/Ctrl+K keyboard shortcut (standard pattern) - **Position**: Centered modal with overlay - **Features**: - Instant search as you type (debounced 150ms) - Filter tabs: All | Diaries | Events - Date range picker (quick presets: Today, This Week, This Month, This Year) - Results show date, type, preview with highlighted matches ### Sidebar (Alternative) - Persistent search box in navigation - Results in scrollable list below - Less intrusive, always visible ### Result Cards ``` ┌─────────────────────────────────────────┐ │ 📅 2024-03-15 [Diary] │ │ Meeting with Sarah about project... │ │ ─────────────────────────────────────── │ │ ...discussed timeline and budget... │ └─────────────────────────────────────────┘ ``` ### UX Details - **Empty state**: Show recent diaries/events when no query - **No results**: Friendly message + suggestions - **Loading**: Subtle spinner (search should be <100ms) - **Keyboard**: Arrow keys to navigate results, Enter to open ### Mobile Considerations - Tap search icon in header → full-screen search - Larger touch targets for filters --- ## 7. Performance Considerations ### Query Performance - FTS5 BM25 queries: ~50-100ms for 10k records - Add LIMIT to prevent unbounded results - Use connection pooling if many concurrent searches ### Write Performance - Triggers add ~5-10ms per insert/update - Batch backfill for existing data (1000 rows/batch) ### Caching Strategy - Cache recent searches (Redis optional, or in-memory) - Cache FTS index in memory (SQLite mmap) ### Scaling Thresholds - < 10k entries: No optimization needed - 10k-100k: Consider FTS5 optimization (tokenizer, prefix search) - > 100k: Consider external search --- ## 8. Implementation Complexity ### Complexity Assessment: **MEDIUM** | Component | Complexity | Notes | |-----------|------------|-------| | FTS5 setup | Low | Raw SQL, one-time | | Triggers | Low | Auto-sync, minimal code | | API endpoint | Low | Standard CRUD pattern | | Frontend modal | Medium | Keyboard shortcuts, state | | Filters/Date | Medium | Multiple filter combinations | | Backfill | Low | One-time script | ### Phased Implementation **Phase 1 (MVP - 2-3 days)** - FTS5 tables + triggers - Basic search API - Simple modal UI with text input **Phase 2 (Enhancements - 1-2 days)** - Date filtering - Type filtering (diary/event) - Result highlighting **Phase 3 (Polish - 1 day)** - Search history - Keyboard navigation - Mobile responsive --- ## 9. Priority Recommendation ### Recommended Priority: **MEDIUM-HIGH** **Rationale:** - Search is a core journaling feature (user wants to find past entries) - Competitor apps (Day One, Journey) have robust search - Implementation complexity is manageable (medium) - Zero external dependencies (SQLite FTS5) ### Factors Supporting High Priority 1. **User Value**: High - helps users find meaningful memories 2. **Implementation Cost**: Medium - achievable in 1 week 3. **Dependency Risk**: Low - no external services needed 4. **Future-proofing**: FTS5 is mature, well-supported ### Factors Against Very High Priority - Current core features (capture, generate) are stable - Small dataset users may not notice missing search - Can be added post-MVP without breaking changes --- ## 10. Open Questions / Further Research 1. **Typo tolerance**: Is exact match sufficient, or do users expect fuzzy search? 2. **Search ranking**: Should recent results be boosted higher? 3. **Multi-language**: Support languages other than English (tokenizer considerations) 4. **Export/Import**: Should search index be rebuilt on data import? 5. **Shared access**: Multi-user search (future consideration) --- ## 11. Summary | Aspect | Recommendation | |--------|----------------| | **Search Engine** | SQLite FTS5 (built-in) | | **UI Pattern** | Cmd/Ctrl+K modal | | **Features** | Instant search, date filter, type filter, relevance sort | | **Complexity** | Medium (3-5 days) | | **Priority** | Medium-High | | **Schema Changes** | FTS5 via raw SQL + optional SearchHistory model | | **API Changes** | New `/search` endpoint with query params |