- Automatic browser geolocation capture on event creation - Reverse geocoding via Nominatim API for place names - Full-text search with SQLite FTS5 - Calendar view for browsing past entries - DateNavigator component for day navigation - SearchModal with Ctrl+K shortcut - QuickAddWidget with Ctrl+J shortcut - Starlight documentation site with GitHub Pages deployment - Multiple AI provider support (Groq, OpenAI, Anthropic, Ollama, LM Studio) - Multi-user registration support BREAKING: Events now include latitude/longitude/placeName fields
316 lines
8.9 KiB
Markdown
316 lines
8.9 KiB
Markdown
# Full-Text Search Feature Research
|
|
|
|
## Overview
|
|
|
|
This document outlines research and implementation ideas for adding full-text search to DearDiary, an AI-powered daily journaling app.
|
|
|
|
---
|
|
|
|
## 1. Feature Description
|
|
|
|
### Core Functionality
|
|
- Search across diary content (journal titles and bodies)
|
|
- Search across raw event content
|
|
- Filter by date range
|
|
- Sort by relevance (BM25) or date
|
|
- Real-time instant search as user types
|
|
|
|
### User Stories
|
|
1. **Quick Recall**: User types "meeting with Sarah" → sees matching diary entries and events
|
|
2. **Date-based Search**: User searches "vacation" → filters to summer 2024 entries
|
|
3. **Deep Search**: User searches for specific phrase → finds exact match in event content
|
|
|
|
---
|
|
|
|
## 2. Technical Approach
|
|
|
|
### Option A: SQLite FTS5 (Recommended for v1)
|
|
|
|
**Pros:**
|
|
- Zero external dependencies
|
|
- Built into SQLite (already in use)
|
|
- BM25 ranking built-in
|
|
- Real-time indexing (update on insert)
|
|
- Lowest implementation complexity
|
|
- No additional infrastructure
|
|
|
|
**Cons:**
|
|
- No typo tolerance (unless using trigram/token helpers)
|
|
- Limited to SQLite (migration cost if switching DB)
|
|
- Single-node only (fine for self-hosted)
|
|
|
|
**Implementation:**
|
|
```sql
|
|
-- FTS5 virtual table for journals
|
|
CREATE VIRTUAL TABLE journal_fts USING fts5(
|
|
title,
|
|
content,
|
|
content_rowid='rowid',
|
|
tokenize='porter unicode61'
|
|
);
|
|
|
|
-- FTS5 virtual table for events
|
|
CREATE VIRTUAL TABLE event_fts USING fts5(
|
|
content,
|
|
type,
|
|
content_rowid='rowid',
|
|
tokenize='porter unicode61'
|
|
);
|
|
|
|
-- Triggers to keep FTS in sync
|
|
CREATE TRIGGER journal_ai AFTER INSERT ON Journal BEGIN
|
|
INSERT INTO journal_fts(rowid, title, content)
|
|
VALUES (NEW.rowid, NEW.title, NEW.content);
|
|
END;
|
|
```
|
|
|
|
**Performance:** FTS5 handles 100k+ rows easily on SQLite. For typical personal journaling (10 years = ~3650 entries, ~10k events), performance will be sub-100ms.
|
|
|
|
### Option B: External Search (Typesense/Meilisearch)
|
|
|
|
**Pros:**
|
|
- Typo tolerance (fuzzy search)
|
|
- Better ranking algorithms
|
|
- Scalable to millions of records
|
|
- REST API, language-agnostic
|
|
|
|
**Cons:**
|
|
- Additional infrastructure (Docker service)
|
|
- Sync complexity (real-time indexing)
|
|
- More complex setup for self-hosted users
|
|
- Resource overhead (CPU/RAM for search service)
|
|
|
|
**Recommendation:** Defer to v2. External search only becomes necessary when:
|
|
- User wants fuzzy/typo-tolerant search
|
|
- Dataset exceeds 500k+ records
|
|
- Multi-language support needed
|
|
|
|
---
|
|
|
|
## 3. Indexing Strategy
|
|
|
|
### Fields to Index
|
|
|
|
| Table | Field | Indexed | Reason |
|
|
|-------|-------|---------|--------|
|
|
| Journal | title | Yes | Primary search target |
|
|
| Journal | content | Yes | Full diary text |
|
|
| Journal | date | Yes | Filtering |
|
|
| Event | content | Yes | Raw event text |
|
|
| Event | type | Yes | Filter by event type |
|
|
| Event | date | Yes | Date filtering |
|
|
|
|
### What NOT to Index
|
|
- `Event.metadata` - JSON blob, search within JSON handled separately if needed
|
|
- `Event.mediaPath` - File paths, not searchable content
|
|
- `User` fields - Not needed for user-facing search
|
|
|
|
### Sync Strategy
|
|
1. **On Insert/Update**: Write to main table, then update FTS via trigger
|
|
2. **On Delete**: FTS trigger removes from index
|
|
3. **Reindex**: Manual endpoint for recovery/debugging
|
|
|
|
---
|
|
|
|
## 4. Database Schema Changes
|
|
|
|
### Prisma Schema Addition
|
|
|
|
```prisma
|
|
// Optional: Search history for "recent searches" feature
|
|
model SearchHistory {
|
|
id String @id @default(uuid())
|
|
userId String
|
|
query String
|
|
createdAt DateTime @default(now())
|
|
|
|
@@index([userId, createdAt])
|
|
}
|
|
```
|
|
|
|
Note: FTS5 tables are virtual and managed via raw SQL, not Prisma models. We'll use `prisma.$executeRaw` for FTS operations.
|
|
|
|
### Migration Steps
|
|
1. Create FTS5 virtual tables (raw SQL)
|
|
2. Create triggers for auto-sync
|
|
3. Backfill existing data
|
|
4. Add SearchHistory model (optional)
|
|
|
|
---
|
|
|
|
## 5. API Changes
|
|
|
|
### New Endpoints
|
|
|
|
```
|
|
GET /api/v1/search?q=<query>&type=diary|event|all&from=2024-01-01&to=2024-12-31&sort=relevance|date&page=1&limit=20
|
|
```
|
|
|
|
**Response:**
|
|
```typescript
|
|
interface SearchResult {
|
|
type: 'diary' | 'event';
|
|
id: string;
|
|
date: string;
|
|
title?: string; // For diaries
|
|
content: string; // Truncated/preview
|
|
highlight?: string; // Matched text with <mark> tags
|
|
score: number; // BM25 relevance
|
|
}
|
|
|
|
interface SearchResponse {
|
|
data: {
|
|
results: SearchResult[];
|
|
total: number;
|
|
page: number;
|
|
limit: number;
|
|
} | null;
|
|
error: null;
|
|
}
|
|
```
|
|
|
|
### Optional Endpoints
|
|
```
|
|
GET /api/v1/search/history // Recent searches
|
|
DELETE /api/v1/search/history // Clear history
|
|
POST /api/v1/search/reindex // Force reindex (admin)
|
|
```
|
|
|
|
---
|
|
|
|
## 6. UI/UX Considerations
|
|
|
|
### Search Modal
|
|
- **Trigger**: Cmd/Ctrl+K keyboard shortcut (standard pattern)
|
|
- **Position**: Centered modal with overlay
|
|
- **Features**:
|
|
- Instant search as you type (debounced 150ms)
|
|
- Filter tabs: All | Diaries | Events
|
|
- Date range picker (quick presets: Today, This Week, This Month, This Year)
|
|
- Results show date, type, preview with highlighted matches
|
|
|
|
### Sidebar (Alternative)
|
|
- Persistent search box in navigation
|
|
- Results in scrollable list below
|
|
- Less intrusive, always visible
|
|
|
|
### Result Cards
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
│ 📅 2024-03-15 [Diary] │
|
|
│ Meeting with Sarah about project... │
|
|
│ ─────────────────────────────────────── │
|
|
│ ...discussed timeline and <mark>budget</mark>... │
|
|
└─────────────────────────────────────────┘
|
|
```
|
|
|
|
### UX Details
|
|
- **Empty state**: Show recent diaries/events when no query
|
|
- **No results**: Friendly message + suggestions
|
|
- **Loading**: Subtle spinner (search should be <100ms)
|
|
- **Keyboard**: Arrow keys to navigate results, Enter to open
|
|
|
|
### Mobile Considerations
|
|
- Tap search icon in header → full-screen search
|
|
- Larger touch targets for filters
|
|
|
|
---
|
|
|
|
## 7. Performance Considerations
|
|
|
|
### Query Performance
|
|
- FTS5 BM25 queries: ~50-100ms for 10k records
|
|
- Add LIMIT to prevent unbounded results
|
|
- Use connection pooling if many concurrent searches
|
|
|
|
### Write Performance
|
|
- Triggers add ~5-10ms per insert/update
|
|
- Batch backfill for existing data (1000 rows/batch)
|
|
|
|
### Caching Strategy
|
|
- Cache recent searches (Redis optional, or in-memory)
|
|
- Cache FTS index in memory (SQLite mmap)
|
|
|
|
### Scaling Thresholds
|
|
- < 10k entries: No optimization needed
|
|
- 10k-100k: Consider FTS5 optimization (tokenizer, prefix search)
|
|
- > 100k: Consider external search
|
|
|
|
---
|
|
|
|
## 8. Implementation Complexity
|
|
|
|
### Complexity Assessment: **MEDIUM**
|
|
|
|
| Component | Complexity | Notes |
|
|
|-----------|------------|-------|
|
|
| FTS5 setup | Low | Raw SQL, one-time |
|
|
| Triggers | Low | Auto-sync, minimal code |
|
|
| API endpoint | Low | Standard CRUD pattern |
|
|
| Frontend modal | Medium | Keyboard shortcuts, state |
|
|
| Filters/Date | Medium | Multiple filter combinations |
|
|
| Backfill | Low | One-time script |
|
|
|
|
### Phased Implementation
|
|
|
|
**Phase 1 (MVP - 2-3 days)**
|
|
- FTS5 tables + triggers
|
|
- Basic search API
|
|
- Simple modal UI with text input
|
|
|
|
**Phase 2 (Enhancements - 1-2 days)**
|
|
- Date filtering
|
|
- Type filtering (diary/event)
|
|
- Result highlighting
|
|
|
|
**Phase 3 (Polish - 1 day)**
|
|
- Search history
|
|
- Keyboard navigation
|
|
- Mobile responsive
|
|
|
|
---
|
|
|
|
## 9. Priority Recommendation
|
|
|
|
### Recommended Priority: **MEDIUM-HIGH**
|
|
|
|
**Rationale:**
|
|
- Search is a core journaling feature (user wants to find past entries)
|
|
- Competitor apps (Day One, Journey) have robust search
|
|
- Implementation complexity is manageable (medium)
|
|
- Zero external dependencies (SQLite FTS5)
|
|
|
|
### Factors Supporting High Priority
|
|
1. **User Value**: High - helps users find meaningful memories
|
|
2. **Implementation Cost**: Medium - achievable in 1 week
|
|
3. **Dependency Risk**: Low - no external services needed
|
|
4. **Future-proofing**: FTS5 is mature, well-supported
|
|
|
|
### Factors Against Very High Priority
|
|
- Current core features (capture, generate) are stable
|
|
- Small dataset users may not notice missing search
|
|
- Can be added post-MVP without breaking changes
|
|
|
|
---
|
|
|
|
## 10. Open Questions / Further Research
|
|
|
|
1. **Typo tolerance**: Is exact match sufficient, or do users expect fuzzy search?
|
|
2. **Search ranking**: Should recent results be boosted higher?
|
|
3. **Multi-language**: Support languages other than English (tokenizer considerations)
|
|
4. **Export/Import**: Should search index be rebuilt on data import?
|
|
5. **Shared access**: Multi-user search (future consideration)
|
|
|
|
---
|
|
|
|
## 11. Summary
|
|
|
|
| Aspect | Recommendation |
|
|
|--------|----------------|
|
|
| **Search Engine** | SQLite FTS5 (built-in) |
|
|
| **UI Pattern** | Cmd/Ctrl+K modal |
|
|
| **Features** | Instant search, date filter, type filter, relevance sort |
|
|
| **Complexity** | Medium (3-5 days) |
|
|
| **Priority** | Medium-High |
|
|
| **Schema Changes** | FTS5 via raw SQL + optional SearchHistory model |
|
|
| **API Changes** | New `/search` endpoint with query params | |