Files
deardiary/todo/search.md
lotherk 0bdd71a4ed feat: v0.1.0 - geolocation capture, calendar, search, Starlight docs site
- Automatic browser geolocation capture on event creation
- Reverse geocoding via Nominatim API for place names
- Full-text search with SQLite FTS5
- Calendar view for browsing past entries
- DateNavigator component for day navigation
- SearchModal with Ctrl+K shortcut
- QuickAddWidget with Ctrl+J shortcut
- Starlight documentation site with GitHub Pages deployment
- Multiple AI provider support (Groq, OpenAI, Anthropic, Ollama, LM Studio)
- Multi-user registration support

BREAKING: Events now include latitude/longitude/placeName fields
2026-03-27 02:27:55 +00:00

316 lines
8.9 KiB
Markdown

# Full-Text Search Feature Research
## Overview
This document outlines research and implementation ideas for adding full-text search to DearDiary, an AI-powered daily journaling app.
---
## 1. Feature Description
### Core Functionality
- Search across diary content (journal titles and bodies)
- Search across raw event content
- Filter by date range
- Sort by relevance (BM25) or date
- Real-time instant search as user types
### User Stories
1. **Quick Recall**: User types "meeting with Sarah" → sees matching diary entries and events
2. **Date-based Search**: User searches "vacation" → filters to summer 2024 entries
3. **Deep Search**: User searches for specific phrase → finds exact match in event content
---
## 2. Technical Approach
### Option A: SQLite FTS5 (Recommended for v1)
**Pros:**
- Zero external dependencies
- Built into SQLite (already in use)
- BM25 ranking built-in
- Real-time indexing (update on insert)
- Lowest implementation complexity
- No additional infrastructure
**Cons:**
- No typo tolerance (unless using trigram/token helpers)
- Limited to SQLite (migration cost if switching DB)
- Single-node only (fine for self-hosted)
**Implementation:**
```sql
-- FTS5 virtual table for journals
CREATE VIRTUAL TABLE journal_fts USING fts5(
title,
content,
content_rowid='rowid',
tokenize='porter unicode61'
);
-- FTS5 virtual table for events
CREATE VIRTUAL TABLE event_fts USING fts5(
content,
type,
content_rowid='rowid',
tokenize='porter unicode61'
);
-- Triggers to keep FTS in sync
CREATE TRIGGER journal_ai AFTER INSERT ON Journal BEGIN
INSERT INTO journal_fts(rowid, title, content)
VALUES (NEW.rowid, NEW.title, NEW.content);
END;
```
**Performance:** FTS5 handles 100k+ rows easily on SQLite. For typical personal journaling (10 years = ~3650 entries, ~10k events), performance will be sub-100ms.
### Option B: External Search (Typesense/Meilisearch)
**Pros:**
- Typo tolerance (fuzzy search)
- Better ranking algorithms
- Scalable to millions of records
- REST API, language-agnostic
**Cons:**
- Additional infrastructure (Docker service)
- Sync complexity (real-time indexing)
- More complex setup for self-hosted users
- Resource overhead (CPU/RAM for search service)
**Recommendation:** Defer to v2. External search only becomes necessary when:
- User wants fuzzy/typo-tolerant search
- Dataset exceeds 500k+ records
- Multi-language support needed
---
## 3. Indexing Strategy
### Fields to Index
| Table | Field | Indexed | Reason |
|-------|-------|---------|--------|
| Journal | title | Yes | Primary search target |
| Journal | content | Yes | Full diary text |
| Journal | date | Yes | Filtering |
| Event | content | Yes | Raw event text |
| Event | type | Yes | Filter by event type |
| Event | date | Yes | Date filtering |
### What NOT to Index
- `Event.metadata` - JSON blob, search within JSON handled separately if needed
- `Event.mediaPath` - File paths, not searchable content
- `User` fields - Not needed for user-facing search
### Sync Strategy
1. **On Insert/Update**: Write to main table, then update FTS via trigger
2. **On Delete**: FTS trigger removes from index
3. **Reindex**: Manual endpoint for recovery/debugging
---
## 4. Database Schema Changes
### Prisma Schema Addition
```prisma
// Optional: Search history for "recent searches" feature
model SearchHistory {
id String @id @default(uuid())
userId String
query String
createdAt DateTime @default(now())
@@index([userId, createdAt])
}
```
Note: FTS5 tables are virtual and managed via raw SQL, not Prisma models. We'll use `prisma.$executeRaw` for FTS operations.
### Migration Steps
1. Create FTS5 virtual tables (raw SQL)
2. Create triggers for auto-sync
3. Backfill existing data
4. Add SearchHistory model (optional)
---
## 5. API Changes
### New Endpoints
```
GET /api/v1/search?q=<query>&type=diary|event|all&from=2024-01-01&to=2024-12-31&sort=relevance|date&page=1&limit=20
```
**Response:**
```typescript
interface SearchResult {
type: 'diary' | 'event';
id: string;
date: string;
title?: string; // For diaries
content: string; // Truncated/preview
highlight?: string; // Matched text with <mark> tags
score: number; // BM25 relevance
}
interface SearchResponse {
data: {
results: SearchResult[];
total: number;
page: number;
limit: number;
} | null;
error: null;
}
```
### Optional Endpoints
```
GET /api/v1/search/history // Recent searches
DELETE /api/v1/search/history // Clear history
POST /api/v1/search/reindex // Force reindex (admin)
```
---
## 6. UI/UX Considerations
### Search Modal
- **Trigger**: Cmd/Ctrl+K keyboard shortcut (standard pattern)
- **Position**: Centered modal with overlay
- **Features**:
- Instant search as you type (debounced 150ms)
- Filter tabs: All | Diaries | Events
- Date range picker (quick presets: Today, This Week, This Month, This Year)
- Results show date, type, preview with highlighted matches
### Sidebar (Alternative)
- Persistent search box in navigation
- Results in scrollable list below
- Less intrusive, always visible
### Result Cards
```
┌─────────────────────────────────────────┐
│ 📅 2024-03-15 [Diary] │
│ Meeting with Sarah about project... │
│ ─────────────────────────────────────── │
│ ...discussed timeline and <mark>budget</mark>... │
└─────────────────────────────────────────┘
```
### UX Details
- **Empty state**: Show recent diaries/events when no query
- **No results**: Friendly message + suggestions
- **Loading**: Subtle spinner (search should be <100ms)
- **Keyboard**: Arrow keys to navigate results, Enter to open
### Mobile Considerations
- Tap search icon in header → full-screen search
- Larger touch targets for filters
---
## 7. Performance Considerations
### Query Performance
- FTS5 BM25 queries: ~50-100ms for 10k records
- Add LIMIT to prevent unbounded results
- Use connection pooling if many concurrent searches
### Write Performance
- Triggers add ~5-10ms per insert/update
- Batch backfill for existing data (1000 rows/batch)
### Caching Strategy
- Cache recent searches (Redis optional, or in-memory)
- Cache FTS index in memory (SQLite mmap)
### Scaling Thresholds
- < 10k entries: No optimization needed
- 10k-100k: Consider FTS5 optimization (tokenizer, prefix search)
- > 100k: Consider external search
---
## 8. Implementation Complexity
### Complexity Assessment: **MEDIUM**
| Component | Complexity | Notes |
|-----------|------------|-------|
| FTS5 setup | Low | Raw SQL, one-time |
| Triggers | Low | Auto-sync, minimal code |
| API endpoint | Low | Standard CRUD pattern |
| Frontend modal | Medium | Keyboard shortcuts, state |
| Filters/Date | Medium | Multiple filter combinations |
| Backfill | Low | One-time script |
### Phased Implementation
**Phase 1 (MVP - 2-3 days)**
- FTS5 tables + triggers
- Basic search API
- Simple modal UI with text input
**Phase 2 (Enhancements - 1-2 days)**
- Date filtering
- Type filtering (diary/event)
- Result highlighting
**Phase 3 (Polish - 1 day)**
- Search history
- Keyboard navigation
- Mobile responsive
---
## 9. Priority Recommendation
### Recommended Priority: **MEDIUM-HIGH**
**Rationale:**
- Search is a core journaling feature (user wants to find past entries)
- Competitor apps (Day One, Journey) have robust search
- Implementation complexity is manageable (medium)
- Zero external dependencies (SQLite FTS5)
### Factors Supporting High Priority
1. **User Value**: High - helps users find meaningful memories
2. **Implementation Cost**: Medium - achievable in 1 week
3. **Dependency Risk**: Low - no external services needed
4. **Future-proofing**: FTS5 is mature, well-supported
### Factors Against Very High Priority
- Current core features (capture, generate) are stable
- Small dataset users may not notice missing search
- Can be added post-MVP without breaking changes
---
## 10. Open Questions / Further Research
1. **Typo tolerance**: Is exact match sufficient, or do users expect fuzzy search?
2. **Search ranking**: Should recent results be boosted higher?
3. **Multi-language**: Support languages other than English (tokenizer considerations)
4. **Export/Import**: Should search index be rebuilt on data import?
5. **Shared access**: Multi-user search (future consideration)
---
## 11. Summary
| Aspect | Recommendation |
|--------|----------------|
| **Search Engine** | SQLite FTS5 (built-in) |
| **UI Pattern** | Cmd/Ctrl+K modal |
| **Features** | Instant search, date filter, type filter, relevance sort |
| **Complexity** | Medium (3-5 days) |
| **Priority** | Medium-High |
| **Schema Changes** | FTS5 via raw SQL + optional SearchHistory model |
| **API Changes** | New `/search` endpoint with query params |