# Data Export Feature - DearDiary Comprehensive research document for implementing a data export feature. --- ## 1. Feature Overview Allow users to export their diary data in multiple formats with flexible scope and options. This feature enables users to: - Backup their data locally - Migrate to other journaling platforms - Create offline archives - Share selected entries --- ## 2. Export Formats ### 2.1 Markdown (.md) **Description**: Human-readable plain text format with frontmatter metadata. **Technical Approach**: - Single file: One `.md` file per day or combined - Use YAML frontmatter for metadata (date, title, word count) - Structure: ```markdown --- date: 2024-01-15 title: A Quiet Morning event_count: 5 generated_at: 2024-01-15T20:30:00Z --- # January 15, 2024 ## Events [08:30] Had coffee and read news [12:00] Team meeting about Q1 goals ## Diary Page The morning started quietly... ``` **Complexity**: Low - straightforward string generation **Priority**: High - most versatile, easy to implement --- ### 2.2 JSON (.json) **Description**: Machine-readable structured format for programmatic use. **Technical Approach**: ```json { "exported_at": "2024-01-15T20:30:00Z", "user_id": "user-uuid", "format_version": "1.0", "entries": [ { "date": "2024-01-15", "journal": { "title": "A Quiet Morning", "content": "The morning started quietly...", "generated_at": "2024-01-15T20:30:00Z" }, "events": [ { "id": "event-uuid", "type": "text", "content": "Had coffee and read news", "created_at": "2024-01-15T08:30:00Z", "metadata": {} } ] } ] } ``` **Complexity**: Low - native Prisma JSON serialization **Priority**: High - essential for backups/migrations --- ### 2.3 PDF (.pdf) **Description**: Print-ready formatted document. **Technical Approach**: - Use `pdfkit` or `puppeteer` (headless Chrome) for generation - Puppeteer recommended for complex layouts/CSS support - Template options: - Simple: Title + content (minimal styling) - Full: Events listed with diary page formatted - Page breaks handled for multi-day exports **Complexity**: Medium - requires additional dependency **Priority**: Medium - high user demand for print/export --- ### 2.4 HTML (.html) **Description**: Web-viewable static pages. **Technical Approach**: - Single HTML file with embedded CSS - Include basic navigation for multi-day exports - Responsive design with print media queries - Structure: ```html DearDiary Export

January 2024

January 15, 2024

5 events

Diary content...

``` **Complexity**: Low-Medium - string generation with CSS **Priority**: Medium - good for web publishing --- ### 2.5 ePub (.epub) **Description**: Ebook format for e-readers. **Technical Approach**: - Use `epub-gen` or similar library - Structure: One chapter per day or per month - Include cover image with app branding - Metadata: Title, author, generated date **Complexity**: High - requires ebook-specific libraries **Priority**: Low - niche use case, can be deprioritized --- ## 3. Export Scope ### 3.1 Single Diary - Export one day's journal + events - API: `GET /api/v1/export?date=2024-01-15` - Returns single entry with all related data ### 3.2 Date Range - Export events between start and end dates - API: `GET /api/v1/export?start=2024-01-01&end=2024-01-31` - Batch query: Prisma `where: { date: { gte: start, lte: end } }` ### 3.3 All Data - Export entire user dataset - Include settings, metadata - Requires pagination for large datasets --- ## 4. Include/Exclude Options ### 4.1 Content Filters | Option | Description | Implementation | |--------|-------------|----------------| | `events_only` | Raw events without AI-generated diaries | Filter journals from response | | `diaries_only` | Only generated diary pages | Filter events from response | | `with_media` | Include media file references | Include `mediaPath` field | | `without_media` | Exclude media references | Omit `mediaPath` field | ### 4.2 Data Structure Options ```typescript interface ExportOptions { format: 'md' | 'json' | 'pdf' | 'html' | 'epub'; scope: 'single' | 'range' | 'all'; date?: string; startDate?: string; endDate?: string; include: { events: boolean; journals: boolean; media: boolean; settings: boolean; }; organization: 'single_file' | 'folder'; compress: boolean; } ``` --- ## 5. File Organization ### 5.1 Single File - All content in one file (`.md`, `.json`, `.html`) - Best for: small exports, JSON backups - Simple to implement ### 5.2 Folder Structure ``` export-2024-01-15/ ├── index.html # Main navigation ├── 2024-01-15/ │ ├── journal.md # Diary page │ ├── events.md # Raw events │ └── media/ # Photos, voice memos ├── 2024-01-14/ │ └── ... └── manifest.json # Export metadata ``` - Best for: large exports with media - Use ZIP compression for download --- ## 6. Compression Options ### 6.1 ZIP Archive - Default for folder exports > 10MB - Use `Bun.zip()` or `archiver` package - Include manifest with export details **Implementation**: ```typescript // Example: ZIP export flow async function exportZip(options: ExportOptions) { const tempDir = await createTempDir(); await generateFiles(tempDir, options); const zipPath = `${tempDir}.zip`; await zip(tempDir, zipPath); return serveFile(zipPath); } ``` --- ## 7. Streaming Large Exports ### 7.1 Problem - Large exports (years of data) can exceed memory - Need progressive loading and streaming response ### 7.2 Solution: Server-Sent Events (SSE) **API Design**: ``` POST /api/v1/export Content-Type: application/json { "format": "json", "startDate": "2020-01-01", "endDate": "2024-01-15" } ``` **Response** (chunked): ``` event: progress data: {"percent": 10, "stage": "loading_events"} event: data data: {"date": "2020-01-01", ...} event: progress data: {"percent": 20, "stage": "loading_journals"} event: data data: {"date": "2020-01-02", ...} event: complete data: {"total_entries": 1000, "export_size": "5MB"} ``` ### 7.3 Implementation Notes - Use Prisma cursor-based pagination for memory efficiency - Stream directly to response without buffering - Provide progress updates every N records --- ## 8. Privacy & Security ### 8.1 Authentication - Require valid API key for all export endpoints - User can only export their own data ### 8.2 Sensitive Data Handling - **Option**: Password-protect exports - Use AES-256 encryption for ZIP - Prompt for password in UI - **Option**: redact sensitive entries - Tag certain events as "private" - Exclude from export by default ### 8.3 Media Files - Generate signed URLs for media export - Set expiration (24h default) - Don't include raw API keys in export ### 8.4 Audit Logging - Log export requests (who, when, scope) - Store in new `ExportLog` model --- ## 9. Database Schema Changes ### 9.1 New Models ```prisma model ExportLog { id String @id @default(uuid()) userId String format String scope String startDate String? endDate String? recordCount Int sizeBytes Int? status String @default("pending") createdAt DateTime @default(now()) completedAt DateTime? user User @relation(fields: [userId], references: [id], onDelete: Cascade) } model ScheduledExport { id String @id @default(uuid()) userId String name String format String scope String @default("all") frequency String @default("weekly") includeJson Json? enabled Boolean @default(true) lastRunAt DateTime? createdAt DateTime @default(now()) updatedAt DateTime @updatedAt user User @relation(fields: [userId], references: [id], onDelete: Cascade) } ``` --- ## 10. API Changes ### 10.1 New Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/api/v1/export` | Create export job | | GET | `/api/v1/export/:id` | Get export status | | GET | `/api/v1/export/:id/download` | Download export file | | GET | `/api/v1/exports` | List export history | | DELETE | `/api/v1/export/:id` | Delete export | | GET | `/api/v1/scheduled-exports` | List scheduled exports | | POST | `/api/v1/scheduled-exports` | Create schedule | | PUT | `/api/v1/scheduled-exports/:id` | Update schedule | | DELETE | `/api/v1/scheduled-exports/:id` | Delete schedule | ### 10.2 Request/Response Examples **Create Export**: ```typescript // POST /api/v1/export interface CreateExportRequest { format: 'md' | 'json' | 'pdf' | 'html' | 'epub'; date?: string; // single day startDate?: string; // range start endDate?: string; // range end include: { events: boolean; journals: boolean; media: boolean; settings: boolean; }; organization: 'single_file' | 'folder'; compress: boolean; password?: string; // optional ZIP password } interface ExportResponse { id: string; status: 'pending' | 'processing' | 'completed' | 'failed'; progress: number; downloadUrl?: string; expiresAt?: string; } ``` --- ## 11. UI/UX Considerations ### 11.1 Export Page Location - Add to Settings page as "Export Data" section - Or create dedicated `/export` route ### 11.2 Export Modal ``` ┌─────────────────────────────────────────┐ │ Export Your Data │ ├─────────────────────────────────────────┤ │ │ │ Format: [Markdown ▼] │ │ ○ Markdown │ │ ○ JSON │ │ ○ PDF │ │ ○ HTML │ │ ○ ePub │ │ │ │ Scope: ○ This month │ │ ○ This year │ │ ○ All time │ │ ○ Custom range [____] │ │ │ │ Include: ☑ Generated diaries │ │ ☑ Raw events │ │ ☐ Media files │ │ ☐ Settings │ │ │ │ Options: ○ Single file │ │ ○ Folder (with ZIP) │ │ │ │ ☐ Password protect │ │ [________] │ │ │ │ [Cancel] [Export] │ └─────────────────────────────────────────┘ ``` ### 11.3 Progress View - Show progress bar during export - Estimated time remaining - Cancel button for large exports - Email notification option (future) ### 11.4 Export History - List of past exports with: - Date, format, scope - Size, record count - Download link (with expiration) - Delete button --- ## 12. Scheduled Exports ### 12.1 Configuration Options | Frequency | Description | |-----------|-------------| | `daily` | Every day at configured time | | `weekly` | Every Sunday | | `monthly` | First day of month | | `quarterly` | Every 3 months | ### 12.2 Implementation - Use cron-style scheduling - Run as background job (Bun.setInterval or dedicated worker) - Store exports in cloud storage (S3-compatible) or local - Send notification when ready ### 12.3 Use Cases - Automated weekly backups - Monthly archive generation - Quarterly review compilation --- ## 13. Implementation Roadmap ### Phase 1: Core Export (Week 1-2) - [ ] Add `ExportLog` model to schema - [ ] Implement JSON export endpoint - [ ] Implement Markdown export endpoint - [ ] Add single date/range query support - [ ] Basic export UI in Settings **Complexity**: 3/5 **Priority**: High ### Phase 2: Advanced Formats (Week 3) - [ ] HTML export - [ ] PDF export (using puppeteer) - [ ] ePub export (optional) **Complexity**: 4/5 **Priority**: Medium ### Phase 3: Large Exports (Week 4) - [ ] Streaming with SSE - [ ] ZIP compression - [ ] Progress reporting **Complexity**: 5/5 **Priority**: Medium ### Phase 4: Automation (Week 5) - [ ] Scheduled exports model - [ ] Background job scheduler - [ ] Scheduled exports UI **Complexity**: 4/5 **Priority**: Low ### Phase 5: Security & Polish (Week 6) - [ ] Password-protected ZIPs - [ ] Export audit logging - [ ] Media file handling - [ ] Edge cases and testing **Complexity**: 3/5 **Priority**: Medium --- ## 14. Dependencies Required | Package | Purpose | Version | |---------|---------|---------| | `pdfkit` | PDF generation | ^0.14.0 | | `puppeteer` | HTML to PDF | ^21.0.0 | | `archiver` | ZIP creation | ^6.0.0 | | `epub-gen` | ePub creation | ^0.1.0 | | `jszip` | Client-side ZIP | ^3.10.0 | --- ## 15. Testing Considerations ### 15.1 Unit Tests - Export formatters (MD, JSON, HTML) - Date range filtering - Include/exclude logic ### 15.2 Integration Tests - Full export workflow - Large dataset performance - Streaming response handling ### 15.3 Edge Cases - Empty date range - Missing media files - Export during active generation - Concurrent export requests --- ## 16. Priority Recommendation | Feature | Priority | Rationale | |---------|----------|-----------| | JSON/Markdown export | P0 | Core requirement for backups | | Single/range export | P0 | Essential scope control | | Export UI | P0 | User-facing feature | | PDF export | P1 | High user demand | | HTML export | P1 | Good alternative to PDF | | Streaming exports | P2 | Performance for large data | | ZIP compression | P2 | Usability for folder exports | | ePub export | P3 | Niche, can skip | | Scheduled exports | P3 | Automation, lower urgency | | Password protection | P4 | Advanced, security theater | --- ## 17. Open Questions 1. **Storage**: Should exports be stored temporarily or generated on-demand? 2. **Retention**: How long to keep export downloads available? 3. **Media handling**: Include actual files or just references? 4. **Third-party sync**: Export to Google Drive, Dropbox? 5. **Incremental exports**: Only export new data since last export? --- ## 18. Summary This feature set provides comprehensive data export capabilities while maintaining security and user privacy. Starting with JSON/Markdown exports covers 80% of use cases (backups, migration). PDF and HTML add print/web options. Streaming and compression enable handling of large datasets. Scheduled exports provide automation for power users. Recommend implementing Phase 1 first to establish core functionality, then iterate based on user feedback.