- Automatic browser geolocation capture on event creation - Reverse geocoding via Nominatim API for place names - Full-text search with SQLite FTS5 - Calendar view for browsing past entries - DateNavigator component for day navigation - SearchModal with Ctrl+K shortcut - QuickAddWidget with Ctrl+J shortcut - Starlight documentation site with GitHub Pages deployment - Multiple AI provider support (Groq, OpenAI, Anthropic, Ollama, LM Studio) - Multi-user registration support BREAKING: Events now include latitude/longitude/placeName fields
15 KiB
Data Export Feature - DearDiary
Comprehensive research document for implementing a data export feature.
1. Feature Overview
Allow users to export their diary data in multiple formats with flexible scope and options. This feature enables users to:
- Backup their data locally
- Migrate to other journaling platforms
- Create offline archives
- Share selected entries
2. Export Formats
2.1 Markdown (.md)
Description: Human-readable plain text format with frontmatter metadata.
Technical Approach:
- Single file: One
.mdfile per day or combined - Use YAML frontmatter for metadata (date, title, word count)
- Structure:
--- date: 2024-01-15 title: A Quiet Morning event_count: 5 generated_at: 2024-01-15T20:30:00Z --- # January 15, 2024 ## Events [08:30] Had coffee and read news [12:00] Team meeting about Q1 goals ## Diary Page The morning started quietly...
Complexity: Low - straightforward string generation Priority: High - most versatile, easy to implement
2.2 JSON (.json)
Description: Machine-readable structured format for programmatic use.
Technical Approach:
{
"exported_at": "2024-01-15T20:30:00Z",
"user_id": "user-uuid",
"format_version": "1.0",
"entries": [
{
"date": "2024-01-15",
"journal": {
"title": "A Quiet Morning",
"content": "The morning started quietly...",
"generated_at": "2024-01-15T20:30:00Z"
},
"events": [
{
"id": "event-uuid",
"type": "text",
"content": "Had coffee and read news",
"created_at": "2024-01-15T08:30:00Z",
"metadata": {}
}
]
}
]
}
Complexity: Low - native Prisma JSON serialization Priority: High - essential for backups/migrations
2.3 PDF (.pdf)
Description: Print-ready formatted document.
Technical Approach:
- Use
pdfkitorpuppeteer(headless Chrome) for generation - Puppeteer recommended for complex layouts/CSS support
- Template options:
- Simple: Title + content (minimal styling)
- Full: Events listed with diary page formatted
- Page breaks handled for multi-day exports
Complexity: Medium - requires additional dependency Priority: Medium - high user demand for print/export
2.4 HTML (.html)
Description: Web-viewable static pages.
Technical Approach:
- Single HTML file with embedded CSS
- Include basic navigation for multi-day exports
- Responsive design with print media queries
- Structure:
<!DOCTYPE html> <html> <head> <title>DearDiary Export</title> <style> body { font-family: system-ui; max-width: 800px; margin: 0 auto; padding: 2rem; } .entry { margin-bottom: 2rem; } .meta { color: #666; font-size: 0.9rem; } </style> </head> <body> <h1>January 2024</h1> <div class="entry"> <h2>January 15, 2024</h2> <div class="meta">5 events</div> <p>Diary content...</p> </div> </body> </html>
Complexity: Low-Medium - string generation with CSS Priority: Medium - good for web publishing
2.5 ePub (.epub)
Description: Ebook format for e-readers.
Technical Approach:
- Use
epub-genor similar library - Structure: One chapter per day or per month
- Include cover image with app branding
- Metadata: Title, author, generated date
Complexity: High - requires ebook-specific libraries Priority: Low - niche use case, can be deprioritized
3. Export Scope
3.1 Single Diary
- Export one day's journal + events
- API:
GET /api/v1/export?date=2024-01-15 - Returns single entry with all related data
3.2 Date Range
- Export events between start and end dates
- API:
GET /api/v1/export?start=2024-01-01&end=2024-01-31 - Batch query: Prisma
where: { date: { gte: start, lte: end } }
3.3 All Data
- Export entire user dataset
- Include settings, metadata
- Requires pagination for large datasets
4. Include/Exclude Options
4.1 Content Filters
| Option | Description | Implementation |
|---|---|---|
events_only |
Raw events without AI-generated diaries | Filter journals from response |
diaries_only |
Only generated diary pages | Filter events from response |
with_media |
Include media file references | Include mediaPath field |
without_media |
Exclude media references | Omit mediaPath field |
4.2 Data Structure Options
interface ExportOptions {
format: 'md' | 'json' | 'pdf' | 'html' | 'epub';
scope: 'single' | 'range' | 'all';
date?: string;
startDate?: string;
endDate?: string;
include: {
events: boolean;
journals: boolean;
media: boolean;
settings: boolean;
};
organization: 'single_file' | 'folder';
compress: boolean;
}
5. File Organization
5.1 Single File
- All content in one file (
.md,.json,.html) - Best for: small exports, JSON backups
- Simple to implement
5.2 Folder Structure
export-2024-01-15/
├── index.html # Main navigation
├── 2024-01-15/
│ ├── journal.md # Diary page
│ ├── events.md # Raw events
│ └── media/ # Photos, voice memos
├── 2024-01-14/
│ └── ...
└── manifest.json # Export metadata
- Best for: large exports with media
- Use ZIP compression for download
6. Compression Options
6.1 ZIP Archive
- Default for folder exports > 10MB
- Use
Bun.zip()orarchiverpackage - Include manifest with export details
Implementation:
// Example: ZIP export flow
async function exportZip(options: ExportOptions) {
const tempDir = await createTempDir();
await generateFiles(tempDir, options);
const zipPath = `${tempDir}.zip`;
await zip(tempDir, zipPath);
return serveFile(zipPath);
}
7. Streaming Large Exports
7.1 Problem
- Large exports (years of data) can exceed memory
- Need progressive loading and streaming response
7.2 Solution: Server-Sent Events (SSE)
API Design:
POST /api/v1/export
Content-Type: application/json
{
"format": "json",
"startDate": "2020-01-01",
"endDate": "2024-01-15"
}
Response (chunked):
event: progress
data: {"percent": 10, "stage": "loading_events"}
event: data
data: {"date": "2020-01-01", ...}
event: progress
data: {"percent": 20, "stage": "loading_journals"}
event: data
data: {"date": "2020-01-02", ...}
event: complete
data: {"total_entries": 1000, "export_size": "5MB"}
7.3 Implementation Notes
- Use Prisma cursor-based pagination for memory efficiency
- Stream directly to response without buffering
- Provide progress updates every N records
8. Privacy & Security
8.1 Authentication
- Require valid API key for all export endpoints
- User can only export their own data
8.2 Sensitive Data Handling
- Option: Password-protect exports
- Use AES-256 encryption for ZIP
- Prompt for password in UI
- Option: redact sensitive entries
- Tag certain events as "private"
- Exclude from export by default
8.3 Media Files
- Generate signed URLs for media export
- Set expiration (24h default)
- Don't include raw API keys in export
8.4 Audit Logging
- Log export requests (who, when, scope)
- Store in new
ExportLogmodel
9. Database Schema Changes
9.1 New Models
model ExportLog {
id String @id @default(uuid())
userId String
format String
scope String
startDate String?
endDate String?
recordCount Int
sizeBytes Int?
status String @default("pending")
createdAt DateTime @default(now())
completedAt DateTime?
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
}
model ScheduledExport {
id String @id @default(uuid())
userId String
name String
format String
scope String @default("all")
frequency String @default("weekly")
includeJson Json?
enabled Boolean @default(true)
lastRunAt DateTime?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
}
10. API Changes
10.1 New Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/export |
Create export job |
| GET | /api/v1/export/:id |
Get export status |
| GET | /api/v1/export/:id/download |
Download export file |
| GET | /api/v1/exports |
List export history |
| DELETE | /api/v1/export/:id |
Delete export |
| GET | /api/v1/scheduled-exports |
List scheduled exports |
| POST | /api/v1/scheduled-exports |
Create schedule |
| PUT | /api/v1/scheduled-exports/:id |
Update schedule |
| DELETE | /api/v1/scheduled-exports/:id |
Delete schedule |
10.2 Request/Response Examples
Create Export:
// POST /api/v1/export
interface CreateExportRequest {
format: 'md' | 'json' | 'pdf' | 'html' | 'epub';
date?: string; // single day
startDate?: string; // range start
endDate?: string; // range end
include: {
events: boolean;
journals: boolean;
media: boolean;
settings: boolean;
};
organization: 'single_file' | 'folder';
compress: boolean;
password?: string; // optional ZIP password
}
interface ExportResponse {
id: string;
status: 'pending' | 'processing' | 'completed' | 'failed';
progress: number;
downloadUrl?: string;
expiresAt?: string;
}
11. UI/UX Considerations
11.1 Export Page Location
- Add to Settings page as "Export Data" section
- Or create dedicated
/exportroute
11.2 Export Modal
┌─────────────────────────────────────────┐
│ Export Your Data │
├─────────────────────────────────────────┤
│ │
│ Format: [Markdown ▼] │
│ ○ Markdown │
│ ○ JSON │
│ ○ PDF │
│ ○ HTML │
│ ○ ePub │
│ │
│ Scope: ○ This month │
│ ○ This year │
│ ○ All time │
│ ○ Custom range [____] │
│ │
│ Include: ☑ Generated diaries │
│ ☑ Raw events │
│ ☐ Media files │
│ ☐ Settings │
│ │
│ Options: ○ Single file │
│ ○ Folder (with ZIP) │
│ │
│ ☐ Password protect │
│ [________] │
│ │
│ [Cancel] [Export] │
└─────────────────────────────────────────┘
11.3 Progress View
- Show progress bar during export
- Estimated time remaining
- Cancel button for large exports
- Email notification option (future)
11.4 Export History
- List of past exports with:
- Date, format, scope
- Size, record count
- Download link (with expiration)
- Delete button
12. Scheduled Exports
12.1 Configuration Options
| Frequency | Description |
|---|---|
daily |
Every day at configured time |
weekly |
Every Sunday |
monthly |
First day of month |
quarterly |
Every 3 months |
12.2 Implementation
- Use cron-style scheduling
- Run as background job (Bun.setInterval or dedicated worker)
- Store exports in cloud storage (S3-compatible) or local
- Send notification when ready
12.3 Use Cases
- Automated weekly backups
- Monthly archive generation
- Quarterly review compilation
13. Implementation Roadmap
Phase 1: Core Export (Week 1-2)
- Add
ExportLogmodel to schema - Implement JSON export endpoint
- Implement Markdown export endpoint
- Add single date/range query support
- Basic export UI in Settings
Complexity: 3/5 Priority: High
Phase 2: Advanced Formats (Week 3)
- HTML export
- PDF export (using puppeteer)
- ePub export (optional)
Complexity: 4/5 Priority: Medium
Phase 3: Large Exports (Week 4)
- Streaming with SSE
- ZIP compression
- Progress reporting
Complexity: 5/5 Priority: Medium
Phase 4: Automation (Week 5)
- Scheduled exports model
- Background job scheduler
- Scheduled exports UI
Complexity: 4/5 Priority: Low
Phase 5: Security & Polish (Week 6)
- Password-protected ZIPs
- Export audit logging
- Media file handling
- Edge cases and testing
Complexity: 3/5 Priority: Medium
14. Dependencies Required
| Package | Purpose | Version |
|---|---|---|
pdfkit |
PDF generation | ^0.14.0 |
puppeteer |
HTML to PDF | ^21.0.0 |
archiver |
ZIP creation | ^6.0.0 |
epub-gen |
ePub creation | ^0.1.0 |
jszip |
Client-side ZIP | ^3.10.0 |
15. Testing Considerations
15.1 Unit Tests
- Export formatters (MD, JSON, HTML)
- Date range filtering
- Include/exclude logic
15.2 Integration Tests
- Full export workflow
- Large dataset performance
- Streaming response handling
15.3 Edge Cases
- Empty date range
- Missing media files
- Export during active generation
- Concurrent export requests
16. Priority Recommendation
| Feature | Priority | Rationale |
|---|---|---|
| JSON/Markdown export | P0 | Core requirement for backups |
| Single/range export | P0 | Essential scope control |
| Export UI | P0 | User-facing feature |
| PDF export | P1 | High user demand |
| HTML export | P1 | Good alternative to PDF |
| Streaming exports | P2 | Performance for large data |
| ZIP compression | P2 | Usability for folder exports |
| ePub export | P3 | Niche, can skip |
| Scheduled exports | P3 | Automation, lower urgency |
| Password protection | P4 | Advanced, security theater |
17. Open Questions
- Storage: Should exports be stored temporarily or generated on-demand?
- Retention: How long to keep export downloads available?
- Media handling: Include actual files or just references?
- Third-party sync: Export to Google Drive, Dropbox?
- Incremental exports: Only export new data since last export?
18. Summary
This feature set provides comprehensive data export capabilities while maintaining security and user privacy. Starting with JSON/Markdown exports covers 80% of use cases (backups, migration). PDF and HTML add print/web options. Streaming and compression enable handling of large datasets. Scheduled exports provide automation for power users.
Recommend implementing Phase 1 first to establish core functionality, then iterate based on user feedback.