Files
deardiary/todo/export.md
lotherk 0bdd71a4ed feat: v0.1.0 - geolocation capture, calendar, search, Starlight docs site
- Automatic browser geolocation capture on event creation
- Reverse geocoding via Nominatim API for place names
- Full-text search with SQLite FTS5
- Calendar view for browsing past entries
- DateNavigator component for day navigation
- SearchModal with Ctrl+K shortcut
- QuickAddWidget with Ctrl+J shortcut
- Starlight documentation site with GitHub Pages deployment
- Multiple AI provider support (Groq, OpenAI, Anthropic, Ollama, LM Studio)
- Multi-user registration support

BREAKING: Events now include latitude/longitude/placeName fields
2026-03-27 02:27:55 +00:00

15 KiB

Data Export Feature - DearDiary

Comprehensive research document for implementing a data export feature.


1. Feature Overview

Allow users to export their diary data in multiple formats with flexible scope and options. This feature enables users to:

  • Backup their data locally
  • Migrate to other journaling platforms
  • Create offline archives
  • Share selected entries

2. Export Formats

2.1 Markdown (.md)

Description: Human-readable plain text format with frontmatter metadata.

Technical Approach:

  • Single file: One .md file per day or combined
  • Use YAML frontmatter for metadata (date, title, word count)
  • Structure:
    ---
    date: 2024-01-15
    title: A Quiet Morning
    event_count: 5
    generated_at: 2024-01-15T20:30:00Z
    ---
    
    # January 15, 2024
    
    ## Events
    [08:30] Had coffee and read news
    [12:00] Team meeting about Q1 goals
    
    ## Diary Page
    
    The morning started quietly...
    

Complexity: Low - straightforward string generation Priority: High - most versatile, easy to implement


2.2 JSON (.json)

Description: Machine-readable structured format for programmatic use.

Technical Approach:

{
  "exported_at": "2024-01-15T20:30:00Z",
  "user_id": "user-uuid",
  "format_version": "1.0",
  "entries": [
    {
      "date": "2024-01-15",
      "journal": {
        "title": "A Quiet Morning",
        "content": "The morning started quietly...",
        "generated_at": "2024-01-15T20:30:00Z"
      },
      "events": [
        {
          "id": "event-uuid",
          "type": "text",
          "content": "Had coffee and read news",
          "created_at": "2024-01-15T08:30:00Z",
          "metadata": {}
        }
      ]
    }
  ]
}

Complexity: Low - native Prisma JSON serialization Priority: High - essential for backups/migrations


2.3 PDF (.pdf)

Description: Print-ready formatted document.

Technical Approach:

  • Use pdfkit or puppeteer (headless Chrome) for generation
  • Puppeteer recommended for complex layouts/CSS support
  • Template options:
    • Simple: Title + content (minimal styling)
    • Full: Events listed with diary page formatted
  • Page breaks handled for multi-day exports

Complexity: Medium - requires additional dependency Priority: Medium - high user demand for print/export


2.4 HTML (.html)

Description: Web-viewable static pages.

Technical Approach:

  • Single HTML file with embedded CSS
  • Include basic navigation for multi-day exports
  • Responsive design with print media queries
  • Structure:
    <!DOCTYPE html>
    <html>
    <head>
      <title>DearDiary Export</title>
      <style>
        body { font-family: system-ui; max-width: 800px; margin: 0 auto; padding: 2rem; }
        .entry { margin-bottom: 2rem; }
        .meta { color: #666; font-size: 0.9rem; }
      </style>
    </head>
    <body>
      <h1>January 2024</h1>
      <div class="entry">
        <h2>January 15, 2024</h2>
        <div class="meta">5 events</div>
        <p>Diary content...</p>
      </div>
    </body>
    </html>
    

Complexity: Low-Medium - string generation with CSS Priority: Medium - good for web publishing


2.5 ePub (.epub)

Description: Ebook format for e-readers.

Technical Approach:

  • Use epub-gen or similar library
  • Structure: One chapter per day or per month
  • Include cover image with app branding
  • Metadata: Title, author, generated date

Complexity: High - requires ebook-specific libraries Priority: Low - niche use case, can be deprioritized


3. Export Scope

3.1 Single Diary

  • Export one day's journal + events
  • API: GET /api/v1/export?date=2024-01-15
  • Returns single entry with all related data

3.2 Date Range

  • Export events between start and end dates
  • API: GET /api/v1/export?start=2024-01-01&end=2024-01-31
  • Batch query: Prisma where: { date: { gte: start, lte: end } }

3.3 All Data

  • Export entire user dataset
  • Include settings, metadata
  • Requires pagination for large datasets

4. Include/Exclude Options

4.1 Content Filters

Option Description Implementation
events_only Raw events without AI-generated diaries Filter journals from response
diaries_only Only generated diary pages Filter events from response
with_media Include media file references Include mediaPath field
without_media Exclude media references Omit mediaPath field

4.2 Data Structure Options

interface ExportOptions {
  format: 'md' | 'json' | 'pdf' | 'html' | 'epub';
  scope: 'single' | 'range' | 'all';
  date?: string;
  startDate?: string;
  endDate?: string;
  include: {
    events: boolean;
    journals: boolean;
    media: boolean;
    settings: boolean;
  };
  organization: 'single_file' | 'folder';
  compress: boolean;
}

5. File Organization

5.1 Single File

  • All content in one file (.md, .json, .html)
  • Best for: small exports, JSON backups
  • Simple to implement

5.2 Folder Structure

export-2024-01-15/
├── index.html          # Main navigation
├── 2024-01-15/
│   ├── journal.md      # Diary page
│   ├── events.md       # Raw events
│   └── media/          # Photos, voice memos
├── 2024-01-14/
│   └── ...
└── manifest.json       # Export metadata
  • Best for: large exports with media
  • Use ZIP compression for download

6. Compression Options

6.1 ZIP Archive

  • Default for folder exports > 10MB
  • Use Bun.zip() or archiver package
  • Include manifest with export details

Implementation:

// Example: ZIP export flow
async function exportZip(options: ExportOptions) {
  const tempDir = await createTempDir();
  await generateFiles(tempDir, options);
  const zipPath = `${tempDir}.zip`;
  await zip(tempDir, zipPath);
  return serveFile(zipPath);
}

7. Streaming Large Exports

7.1 Problem

  • Large exports (years of data) can exceed memory
  • Need progressive loading and streaming response

7.2 Solution: Server-Sent Events (SSE)

API Design:

POST /api/v1/export
Content-Type: application/json

{
  "format": "json",
  "startDate": "2020-01-01",
  "endDate": "2024-01-15"
}

Response (chunked):

event: progress
data: {"percent": 10, "stage": "loading_events"}

event: data
data: {"date": "2020-01-01", ...}

event: progress
data: {"percent": 20, "stage": "loading_journals"}

event: data
data: {"date": "2020-01-02", ...}

event: complete
data: {"total_entries": 1000, "export_size": "5MB"}

7.3 Implementation Notes

  • Use Prisma cursor-based pagination for memory efficiency
  • Stream directly to response without buffering
  • Provide progress updates every N records

8. Privacy & Security

8.1 Authentication

  • Require valid API key for all export endpoints
  • User can only export their own data

8.2 Sensitive Data Handling

  • Option: Password-protect exports
    • Use AES-256 encryption for ZIP
    • Prompt for password in UI
  • Option: redact sensitive entries
    • Tag certain events as "private"
    • Exclude from export by default

8.3 Media Files

  • Generate signed URLs for media export
  • Set expiration (24h default)
  • Don't include raw API keys in export

8.4 Audit Logging

  • Log export requests (who, when, scope)
  • Store in new ExportLog model

9. Database Schema Changes

9.1 New Models

model ExportLog {
  id          String   @id @default(uuid())
  userId      String
  format      String
  scope       String
  startDate   String?
  endDate     String?
  recordCount Int
  sizeBytes   Int?
  status      String   @default("pending")
  createdAt   DateTime @default(now())
  completedAt DateTime?

  user User @relation(fields: [userId], references: [id], onDelete: Cascade)
}

model ScheduledExport {
  id          String   @id @default(uuid())
  userId      String
  name        String
  format      String
  scope       String   @default("all")
  frequency   String   @default("weekly")
  includeJson Json?
  enabled     Boolean  @default(true)
  lastRunAt   DateTime?
  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt

  user User @relation(fields: [userId], references: [id], onDelete: Cascade)
}

10. API Changes

10.1 New Endpoints

Method Endpoint Description
POST /api/v1/export Create export job
GET /api/v1/export/:id Get export status
GET /api/v1/export/:id/download Download export file
GET /api/v1/exports List export history
DELETE /api/v1/export/:id Delete export
GET /api/v1/scheduled-exports List scheduled exports
POST /api/v1/scheduled-exports Create schedule
PUT /api/v1/scheduled-exports/:id Update schedule
DELETE /api/v1/scheduled-exports/:id Delete schedule

10.2 Request/Response Examples

Create Export:

// POST /api/v1/export
interface CreateExportRequest {
  format: 'md' | 'json' | 'pdf' | 'html' | 'epub';
  date?: string;           // single day
  startDate?: string;      // range start
  endDate?: string;       // range end
  include: {
    events: boolean;
    journals: boolean;
    media: boolean;
    settings: boolean;
  };
  organization: 'single_file' | 'folder';
  compress: boolean;
  password?: string;      // optional ZIP password
}

interface ExportResponse {
  id: string;
  status: 'pending' | 'processing' | 'completed' | 'failed';
  progress: number;
  downloadUrl?: string;
  expiresAt?: string;
}

11. UI/UX Considerations

11.1 Export Page Location

  • Add to Settings page as "Export Data" section
  • Or create dedicated /export route

11.2 Export Modal

┌─────────────────────────────────────────┐
│  Export Your Data                       │
├─────────────────────────────────────────┤
│                                         │
│  Format:  [Markdown ▼]                 │
│            ○ Markdown                  │
│            ○ JSON                       │
│            ○ PDF                        │
│            ○ HTML                       │
│            ○ ePub                       │
│                                         │
│  Scope:   ○ This month                  │
│            ○ This year                  │
│            ○ All time                   │
│            ○ Custom range    [____]     │
│                                         │
│  Include: ☑ Generated diaries           │
│           ☑ Raw events                  │
│           ☐ Media files                 │
│           ☐ Settings                    │
│                                         │
│  Options: ○ Single file                 │
│            ○ Folder (with ZIP)          │
│                                         │
│           ☐ Password protect           │
│           [________]                    │
│                                         │
│  [Cancel]              [Export]         │
└─────────────────────────────────────────┘

11.3 Progress View

  • Show progress bar during export
  • Estimated time remaining
  • Cancel button for large exports
  • Email notification option (future)

11.4 Export History

  • List of past exports with:
    • Date, format, scope
    • Size, record count
    • Download link (with expiration)
    • Delete button

12. Scheduled Exports

12.1 Configuration Options

Frequency Description
daily Every day at configured time
weekly Every Sunday
monthly First day of month
quarterly Every 3 months

12.2 Implementation

  • Use cron-style scheduling
  • Run as background job (Bun.setInterval or dedicated worker)
  • Store exports in cloud storage (S3-compatible) or local
  • Send notification when ready

12.3 Use Cases

  • Automated weekly backups
  • Monthly archive generation
  • Quarterly review compilation

13. Implementation Roadmap

Phase 1: Core Export (Week 1-2)

  • Add ExportLog model to schema
  • Implement JSON export endpoint
  • Implement Markdown export endpoint
  • Add single date/range query support
  • Basic export UI in Settings

Complexity: 3/5 Priority: High

Phase 2: Advanced Formats (Week 3)

  • HTML export
  • PDF export (using puppeteer)
  • ePub export (optional)

Complexity: 4/5 Priority: Medium

Phase 3: Large Exports (Week 4)

  • Streaming with SSE
  • ZIP compression
  • Progress reporting

Complexity: 5/5 Priority: Medium

Phase 4: Automation (Week 5)

  • Scheduled exports model
  • Background job scheduler
  • Scheduled exports UI

Complexity: 4/5 Priority: Low

Phase 5: Security & Polish (Week 6)

  • Password-protected ZIPs
  • Export audit logging
  • Media file handling
  • Edge cases and testing

Complexity: 3/5 Priority: Medium


14. Dependencies Required

Package Purpose Version
pdfkit PDF generation ^0.14.0
puppeteer HTML to PDF ^21.0.0
archiver ZIP creation ^6.0.0
epub-gen ePub creation ^0.1.0
jszip Client-side ZIP ^3.10.0

15. Testing Considerations

15.1 Unit Tests

  • Export formatters (MD, JSON, HTML)
  • Date range filtering
  • Include/exclude logic

15.2 Integration Tests

  • Full export workflow
  • Large dataset performance
  • Streaming response handling

15.3 Edge Cases

  • Empty date range
  • Missing media files
  • Export during active generation
  • Concurrent export requests

16. Priority Recommendation

Feature Priority Rationale
JSON/Markdown export P0 Core requirement for backups
Single/range export P0 Essential scope control
Export UI P0 User-facing feature
PDF export P1 High user demand
HTML export P1 Good alternative to PDF
Streaming exports P2 Performance for large data
ZIP compression P2 Usability for folder exports
ePub export P3 Niche, can skip
Scheduled exports P3 Automation, lower urgency
Password protection P4 Advanced, security theater

17. Open Questions

  1. Storage: Should exports be stored temporarily or generated on-demand?
  2. Retention: How long to keep export downloads available?
  3. Media handling: Include actual files or just references?
  4. Third-party sync: Export to Google Drive, Dropbox?
  5. Incremental exports: Only export new data since last export?

18. Summary

This feature set provides comprehensive data export capabilities while maintaining security and user privacy. Starting with JSON/Markdown exports covers 80% of use cases (backups, migration). PDF and HTML add print/web options. Streaming and compression enable handling of large datasets. Scheduled exports provide automation for power users.

Recommend implementing Phase 1 first to establish core functionality, then iterate based on user feedback.