Screenshot Inbox | Kristofer Holmquist

I started Screenshot Inbox from a familiar habit: using screenshots as temporary memory. A receipt, a product I might buy later, a calendar invite, a place, a quote, a link, a message with a date in it. The capture step is effortless, but the follow-through is terrible. The useful bit ends up buried in the photo library with no title, no structure, and no reminder attached to it.

The repo reads like an attempt to turn that habit into an actual inbox. Not a photo gallery, and not a general file manager. The app scans for screenshots, extracts text, classifies the contents, and suggests the next action: save a link, create a reminder, add an event, keep a note, open a map, or archive the item.

I used Codex heavily here, so the commit history became part of the design record. The early commits move in a straight line from app shell to permissions, local schema, photo indexing, OCR, classification, actions, and search. The later commits are where the project gets more interesting: privacy controls, analytics boundaries, QA fixtures, release readiness, support docs, runbooks, and explicit blocker tracking. It turned into both an app foundation and a small operating model for what it would take to ship it responsibly.

What It Does

The launch scope is deliberately narrow. Screenshot Inbox focuses on productivity-oriented screenshot types: events, locations, receipts, products, links, and notes. The first run flow asks for photo access, the inbox scans the library for screenshots, and each item moves through a processing state: pending OCR, needs review, ready, actioned, or archived.

The main surfaces are practical:

an inbox for scanned screenshots
a detail view with extracted fields and suggested actions
search over OCR text and normalized fields
smart folders for receipts, events, products, places, links, notes, favorites, and needs-review items
settings for privacy, local storage, feedback, and paywall state

The app also has the pieces that are easy to skip in a prototype but matter for this product: permission recovery, source-deleted handling, local image copies, correction forms, action acceptance state, and analytics that avoid collecting raw private content.

How It Works

The core architecture is a local-first processing pipeline. Screenshots are detected from the photo library, OCR runs through an adapter boundary, classification starts with deterministic local rules, and search/action state is stored locally.

flowchart TD
  Photos[Photo library screenshots] --> Scan[Scan and filter assets]
  Scan --> OCR[On-device OCR adapter]
  OCR --> Classify[Local classification and extraction]
  Classify --> Review[Inbox and detail review]
  Classify --> Search[Local search index]
  Review --> Actions[Reminder, calendar, link, note, map, archive]
  Classify --> Fallback{Low confidence?}
  Fallback -->|Opt-in only| Cloud[Redacted cloud fallback]
  Fallback -->|No opt-in| NeedsReview[Needs Review]
  Cloud --> Review

The local data model is built around Expo SQLite with a versioned schema. It stores screenshots, OCR results, extractions, actions, folders, search index rows, settings, processing queue entries, usage counters, and future sync cursors. Search has both a documented FTS5 direction and an implemented ranking layer that normalizes queries, handles common OCR-ish misspellings, expands semantic terms, applies type filters, and scores matches across title, OCR text, merchant, domain, place, and screenshot type.

The classification layer is intentionally conservative. It uses local rules for URLs, currency, dates, times, addresses, merchants, and first meaningful lines, then maps those signals into supported screenshot types. Confidence decides whether an item is ready or needs review. The cloud path exists, but it is gated: the user has to opt in, text is redacted, requests are validated, responses are strict JSON, and the app records model version and estimated cost metadata instead of pretending AI calls are free.

That privacy boundary shaped a lot of the project. The docs call out raw screenshots, OCR text, extracted fields, and action payloads as sensitive. The launch design keeps screenshot images on device, keeps OCR text local, avoids private content in analytics, and treats cloud extraction as a fallback for ambiguous cases rather than the default path.

The Interesting Part

The most useful design decision was treating a screenshot as unfinished intent, not as media.

That changes the product shape. A receipt wants a note or expense-like record. An event wants a calendar item or reminder. A product screenshot wants a saved link or price-watch candidate. A location wants a map action. A link wants to be saved. A messy note wants review rather than false confidence.

The code mirrors that idea. processInboxItemWithOcr creates a pending inbox item, runs OCR, normalizes text, classifies the screenshot, optionally tries cloud fallback, extracts typed fields, generates action suggestions, and returns a richer inbox item. The action adapters then separate suggestion generation from execution, so calendar, reminders, notifications, contacts, links, notes, maps, share, copy text, and archive can be tested as product behavior instead of being tangled into the UI.

The later commits also show the project pushing past the happy path. There are tests for primitives, permissions, classification, OCR pipeline behavior, search, smart folders, privacy controls, analytics, paywall limits, backend validation, integration flows, QA matrices, and release checklists. Synthetic screenshot fixtures cover the launch types, which is the right compromise for a privacy-heavy app: quality tests need realistic structure without relying on private user screenshots.

I would not describe the repo as a finished production app-store release. The project is careful about that. docs/goal-blocker-todo.md lists the things that require external access: Apple Developer, Google Play Console, RevenueCat, hosted Supabase projects, PostHog dashboards, production DNS, signed builds, device QA, and live store submission. The local foundation is broad, but the release boundary is explicit.

What I Would Change

The next version should prove the riskiest assumption on devices: OCR quality across real screenshots. The adapter boundaries are there, and the docs describe Apple Vision and Google ML Kit paths, but production confidence depends on signed builds, real device libraries, and a fixture set that grows from actual beta feedback.

I would also tighten the line between implemented behavior and product planning. The repo contains a lot of thoughtful launch, marketing, support, operations, and post-PMF documentation. That is useful, but the app would benefit from a small release evidence page that says, in one place, what is implemented, what is mocked or adapter-backed, what is tested locally, and what still requires external verification.

The project taught me that privacy-first consumer AI apps are mostly about boundaries. The OCR and classification are the obvious parts. The harder product work is deciding when not to send data out, when not to suggest an action, how to recover from low confidence, and how to keep launch claims aligned with what has actually been verified.