Local First Software Primer

By Caitlin Lohrenz

January 1, 1970

About this collection

## Local-First Software Research & Development Overview This collection represents a comprehensive survey of the emerging **local-first software paradigm**, which prioritizes keeping data and computation on users' devices while enabling seamless collaboration and synchronization. The documents span foundational research, practical implementations, and commercial developments in this space. **Core Technologies**: The collection heavily focuses on **Conflict-free Replicated Data Types (CRDTs)** as the foundational technology enabling local-first applications. Multiple documents detail CRDT implementations (Automerge, Y.js, json-joy), performance benchmarks, and practical applications. **Key Players & Products**: Several companies are actively building local-first solutions including ElectricSQL (Postgres sync), Anytype (knowledge management), Ditto (edge synchronization), and TinyBase (reactive data stores). Apple's on-device AI models represent a major tech company embracing local-first principles. **Strategic Implications**: Local-first represents a fundamental architectural shift from cloud-first development, offering benefits like instant responsiveness, offline functionality, data ownership, and reduced infrastructure costs. However, it requires rethinking authentication, business logic placement, and data synchronization patterns. **Current State**: The field is rapidly maturing with production-ready tools, established benchmarks, and real-world deployments across industries from airlines to collaborative software.

Curated Sources

How Local-First Development Is Changing How We Make Software | Heavybit

Local-first development is a new ethos that prioritizes keeping data and code on the user's device first, contrasting with cloud-first development which centralizes data online. This approach aims to combine the benefits of cloud collaboration with the data ownership of traditional desktop software. Experts like Adam Wiggins, Brooklyn Zelenka, Matt Biilmann, and Yonatan Feleke discuss the potential of local-first development to increase velocity, decrease resource requirements, and lower barriers to entry for various projects. They also touch on challenges such as managing authentications, data sync issues, and cultural pushback. Local-first development could lead to a declarative web where developers focus less on imperative state management and more on the desired state of their applications, potentially simplifying development and improving user experiences.

Key Takeaways

Local-first development offers improved developer productivity, lower hosting costs, and more-responsive UIs by keeping data on the user's device and using sync engines for collaboration.
The shift to local-first development may cause cultural clashes with conservative dev teams and massive cloud providers who have a vested interest in maintaining the status quo.
Local-first development and sync engines could enable a massively multiplayer future with AI agents, changing how applications are built and interacted with.
The local-first community is working to address technical challenges such as conflict resolution, schema migrations, and sustainable business models.
Local-first development has the potential to collapse the stack into simpler components: a local database, a data model, and business logic, thereby reducing engineering costs associated with managing complex microservices architectures.

List CRDT Benchmarks

The document presents a comprehensive benchmarking of json-joy, a list CRDT implementation, against other popular CRDT libraries such as Y.js and Automerge, as well as non-CRDT libraries like Diamond Types and Rope.js. The results show that json-joy is significantly faster, being around 100x faster than Y.js and 1,000x faster than Automerge. The benchmarking was performed using real-world editing traces, and the results demonstrate json-joy's superior performance in handling large text documents and high transaction volumes. The document also discusses the algorithms used by the compared CRDT libraries, including RGA and YATA, and provides insights into the performance differences between them. Additionally, the author explores the potential for further optimization and discusses the implications of the findings for collaborative editing applications.

Key Takeaways

json-joy's novel Block-wise RGA CRDT algorithm achieves a significant performance improvement over existing CRDT libraries, making it around 100x faster than Y.js and 1,000x faster than Automerge.
The benchmarking results demonstrate that json-joy can handle millions of transactions per second, making it suitable for high-performance collaborative editing applications.
The performance difference between json-joy and other CRDT libraries is largely due to the efficiency of the RGA algorithm and the use of a Rope data structure to represent text contents.

A Gentle Introduction to CRDTs – vlcn.io

This document provides a comprehensive introduction to Conflict-Free Replicated Data Types (CRDTs), a data structure used in distributed systems to achieve eventual consistency without conflicts. It covers the definition, use cases, and implementation of CRDTs, including simple examples like grow-only sets and last-write-wins registers. The document also discusses common pitfalls in implementing CRDTs, such as trusting system time and incorrect tie-breaking, and introduces logical clocks as a solution to handle time-related issues. It concludes by highlighting the importance of CRDTs in achieving peer-to-peer data replication and consistency.

Key Takeaways

CRDTs enable conflict-free data replication across distributed systems by ensuring eventual consistency through specific data structures and merge algorithms.
Implementing CRDTs requires careful handling of time and causality, often using logical clocks to determine the order of events and resolve conflicts.
Common mistakes in CRDT implementation include trusting system time, incorrect tie-breaking for concurrent updates, and failing to update timestamps correctly during merges.
CRDTs can be applied to various data types, such as grow-only sets and last-write-wins registers, to achieve conflict-free replication in distributed systems.
The use of CRDTs facilitates peer-to-peer data replication and consistency, eliminating the need for a central server and enabling offline or low-connectivity scenarios.

Designing Data Structures for Collaborative Apps - Matthew Weidner

The document discusses designing Conflict-free Replicated Data Types (CRDTs) for collaborative applications. CRDTs are data structures that allow multiple users to collaborate on a shared state, ensuring that the state remains consistent across all users. The author presents various CRDT designs, including the Unique Set, List CRDT, and Registers, and explains how to compose them into more complex data structures. The document also covers principles of CRDT design, such as expressing operations in terms of user intention and using causal for-each operations. A case study on designing a CRDT for a collaborative spreadsheet is provided, demonstrating how to apply these principles and techniques to a real-world application.

Key Takeaways

CRDTs can be used to build collaborative applications that work offline and are end-to-end encrypted.
Composing simple CRDTs into more complex data structures is key to designing CRDTs for complex applications.
Expressing operations in terms of user intention is crucial for ensuring that CRDTs behave as expected in the presence of concurrent operations.
Using causal for-each operations and concurrent+causal for-each operations can help ensure that CRDTs behave correctly in various concurrency scenarios.

In Search of a Local-First Database | Jared Forsyth.com

The author is searching for a local-first database that meets three main requirements: network-optional functionality, data availability across devices, and open-source server implementation. The author evaluates various solutions based on correctness, cost, and flexibility, considering factors such as conflict handling, data replication, schema changes, and integration with existing databases. Several projects are assessed, including gun-js, remoteStorage.js, rxdb + pouchdb, and hypermerge + automerge, with some being rejected due to lack of client-side persistence or server-side implementation. The author outlines a comprehensive list of evaluation criteria and provides links to individual project evaluations.

Key Takeaways

The need for local-first databases that can handle offline syncing and data replication across devices is becoming increasingly important, and existing solutions often fall short of meeting the required criteria.
The evaluation criteria for local-first databases should include correctness, cost, and flexibility, taking into account factors such as conflict resolution, data preservation, and schema changes.
CRDTs (Conflict-free Replicated Data Types) are a key technology for achieving seamless data replication and syncing in local-first databases, but their implementation can be complex and requires careful consideration of factors such as data consistency and intent preservation.

Closing The Gap Between Your Users And Their Data

The author discusses the shift from local-first to cloud-first applications over the last decade, highlighting the trade-offs and limitations of cloud-first architecture. They introduce TinyBase, a tiny JavaScript library that enables reactive, relational data storage on the client-side, allowing for faster and more responsive user experiences. The author showcases the benefits of local-first apps, including improved performance, reduced latency, and enhanced user experience. They also discuss the challenges and limitations of implementing local-first architecture, particularly for certain types of applications. The talk concludes by encouraging developers to consider local-first approaches for building more responsive and user-friendly applications.

Key Takeaways

Local-first apps can provide faster and more responsive user experiences by storing data on the client-side, reducing reliance on network connectivity and latency.
TinyBase is a lightweight JavaScript library that enables reactive, relational data storage on the client-side, making it suitable for building local-first applications.
While local-first architecture is not suitable for all types of applications, it can be a viable alternative for certain use cases, offering improved performance and user experience.
Implementing local-first architecture requires careful consideration of data synchronization and collaboration, particularly for applications that require real-time collaboration or multi-user support.
By revisiting local-first approaches, developers can build more responsive and user-friendly applications that better meet the needs of their users.

Developing local-first software | ElectricSQL

This document discusses the concept of local-first software development using ElectricSQL, a tool that enables real-time multi-user experiences with offline support, resilience, privacy, and data ownership. It compares cloud-first and local-first systems, highlighting the need to codify authentication, filtering, and validation into database security rules and to use live queries and event sourcing. The document also touches on the challenges of concurrent writes, partitioning, and partial replication, and how ElectricSQL's Shape-based system addresses these issues. It emphasizes the importance of adopting a causal consistency mindset in local-first systems.

Key Takeaways

Local-first software development with ElectricSQL enables modern, real-time multi-user experiences with built-in offline support and data ownership.
Codifying authentication and validation logic into database security rules is crucial for local-first systems, replacing traditional backend controllers and middleware.
ElectricSQL's Shape-based system allows for dynamic partial replication, optimizing data transfer and placement for local-first applications.
Causal consistency is essential for local-first systems, requiring application developers to adapt to concurrent writes and potential data inconsistencies.
By using live queries and event sourcing, developers can build responsive and resilient local-first applications that provide a seamless user experience.

What are CRDTs – Loro

This document provides an in-depth explanation of Conflict-Free Replicated Data Types (CRDTs), a data structure used in distributed systems to achieve eventual consistency without conflicts. CRDTs enable multiple users to edit shared documents or databases concurrently, even offline, and synchronize changes when connected. The concept of CRDTs was first introduced in 2011 by Marc Shapiro, building upon earlier research. CRDTs satisfy the CAP theorem's constraints by providing Strong Eventual Consistency (SEC), availability, and partition tolerance. The document discusses the principles of Op-based CRDTs, comparing them to Operation Transformation (OT) and highlighting their differences in design complexity, decentralization, and data size. Examples of simple CRDTs, such as Grow-only Counter and Grow-only Set, demonstrate how they achieve consistency without conflicts.

Key Takeaways

CRDTs provide a mathematically sound approach to achieving Strong Eventual Consistency in distributed systems, making them suitable for collaborative applications.
The decentralized nature of CRDTs allows for peer-to-peer synchronization, making them ideal for applications that require offline access and fault tolerance.
While CRDTs offer simplicity in ensuring consistency, they can be more challenging to design when preserving user intent, and may result in larger document sizes compared to Operation Transformation (OT).

Introducing Apple’s On-Device and Server Foundation Models - Apple Machine Learning Research

Apple introduced Apple Intelligence at the 2024 Worldwide Developers Conference, a personal intelligence system integrated into iOS 18, iPadOS 18, and macOS Sequoia. It comprises multiple generative models, including a ~3 billion parameter on-device language model and a larger server-based language model available with Private Cloud Compute. The models are designed to perform specialized tasks efficiently, accurately, and responsibly. Apple's approach to AI development is guided by four Responsible AI principles: empowering users with intelligent tools, representing users authentically, designing with care, and protecting privacy. The foundation models are trained on licensed data and publicly available data, with filters to remove personally identifiable information and low-quality content. Apple has developed novel algorithms for post-training, including rejection sampling fine-tuning and reinforcement learning from human feedback. The models are optimized for on-device and server performance using techniques such as grouped-query-attention, low-bit palletization, and activation quantization. Adapters are used to fine-tune the models for specific tasks, and the models are evaluated on human-annotated datasets and adversarial prompts. The results show that Apple's models outperform comparable models in terms of helpfulness and safety.

Key Takeaways

Apple's foundation models are designed to be highly capable, fast, and power-efficient, with a focus on responsible AI development and user privacy.
The use of adapters allows for efficient fine-tuning of the models for specific tasks, enabling dynamic specialization on-the-fly.
Apple's models demonstrate superior performance in human evaluations, outperforming comparable open-source and commercial models in terms of helpfulness and safety.

Ditto lands $82M to synchronize data from the edge to the cloud | TechCrunch

Ditto, a company specializing in edge-to-cloud data synchronization, has secured $82 million in Series B funding at a post-money valuation of $462 million. Founded in 2018, Ditto's platform enables 'resilient' connectivity for edge devices by utilizing existing hardware such as smartphones to create ad-hoc mesh networks. This allows for peer-to-peer data synchronization without relying on centralized cloud servers, reducing latency and optimizing bandwidth. Ditto's technology is particularly valuable in industries requiring real-time data processing, such as airlines and autonomous vehicles. The company has already seen significant adoption, including a $950 million contract with the U.S. Air Force and partnerships with major airlines like Delta. With the new funding, Ditto plans to expand its team, scale its product, and forge further partnerships with cloud database vendors, capitalizing on the growing demand for edge computing driven by AI and IoT applications.

Key Takeaways

Ditto's innovative approach to edge computing eliminates the need for dedicated edge servers, reducing costs and complexity for customers.
The company's technology has significant implications for industries requiring real-time data processing and low latency, such as autonomous vehicles and industrial IoT.
As AI continues to drive demand for edge computing, Ditto is poised to become a key player in the market, with its platform offering improved resiliency, privacy, and cost savings.

A New, Networked Era for Anytype

Anytype has publicly debuted its first version of local-first sharing and collaboration, marking a significant step towards its vision of protecting digital freedoms. This development is made possible by Anysync, an open-source protocol that enables encrypted communication and collaboration at scale. The initial release allows users to create shared spaces and collaborate with others directly, without relying on cloud services. While the current version is basic and lacks features like notifications and comments, the company plans to continue refining and expanding the capabilities of its local-first network. The introduction of multiplayer functionality transforms Anytype from a personal knowledge management tool into a collaborative platform for families, communities, teams, and creators.

Key Takeaways

The introduction of local-first collaboration in Anytype represents a significant shift towards a 'no-one-in-between' network, where users have direct control over their data and connections.
Anytype's use of Anysync protocol, P2P sync, and CRDTs enables offline access, speedy loading times, and end-to-end encryption, providing a robust foundation for collaborative work.
The initial release of collaboration features is just the beginning, with future plans including the addition of notifications, comments, public spaces, and other essential features to enhance user experience and connectivity.

Blog | Automerge CRDT

The document discusses several updates to Automerge, a local-first data sync engine. It covers the release of Automerge 3.0, which reduces memory usage by over 10x, and Automerge Repo 2.0, which improves developer experience with new APIs for version control and asynchronous document handling. Automerge Anywhere introduces new packaging options for loading Automerge in various environments. Additionally, Automerge 2.2 adds rich text support with a ProseMirror binding. The updates aim to enhance performance, reliability, and usability for building collaborative applications.

Key Takeaways

Automerge 3.0 significantly reduces memory usage by using a compressed representation at runtime, making it feasible for a wider range of scenarios.
Automerge Repo 2.0 introduces asynchronous document handling and new version control methods, simplifying code and improving developer experience.
The introduction of rich text support in Automerge 2.2 enables accurate merging of complex formatting changes, suitable for offline collaboration.

Electric 1.0 released | ElectricSQL

Electric 1.0 is now generally available, marking a significant milestone in the development of a Postgres sync engine designed to simplify building real-time applications. The release follows a major rebuild in 2024 to enhance simplicity, speed, reliability, and scalability. Electric's core APIs are now stable, ensuring no backwards-incompatible changes in patch or minor releases. The sync engine has been stress-tested in production environments by companies like Trigger.dev, Otto, and IP.world, demonstrating its stability and scalability. Electric handles partial replication, fan-out, and data delivery, making it a powerful tool for developers. The team is already working on future enhancements, including more expressive partial replication primitives and advanced stream processing.

Key Takeaways

The release of Electric 1.0 signifies a major step forward in local-first software development, providing a stable and scalable Postgres sync engine that simplifies real-time data synchronization.
Electric's ability to handle partial replication, fan-out, and data delivery makes it particularly valuable for applications requiring real-time updates, such as collaborative spreadsheets or AI-driven agent updates.
The Electric team's future plans, including differential dataflow in TypeScript and collaboration with TanStack for optimistic state management, indicate a continued focus on enhancing the platform's expressiveness and ease of use.

Local-first software: You own your data, in spite of the cloud

The article proposes 'local-first software' as a set of principles that prioritize user ownership and control over data, while maintaining the benefits of real-time collaboration and cross-device access. It discusses the limitations of cloud apps, surveys existing data storage and sharing models, and examines the potential of Conflict-free Replicated Data Types (CRDTs) as a foundational technology for local-first software. The authors share findings from developing local-first prototypes and suggest next steps for researchers, app developers, and entrepreneurs.

Key Takeaways

Local-first software can provide both collaboration and ownership by treating local data as the primary copy and using servers for secondary copies.
CRDTs have the potential to be a foundational technology for local-first software, enabling real-time collaboration and automatic merging of changes.
The use of CRDTs and peer-to-peer technologies can help realize the local-first vision, but further research is needed to address challenges such as conflict resolution, data visualization, and network communication.

Frequently Asked Questions

How do the CRDT performance benchmarks between json-joy, Y.js, and Automerge translate to real-world application performance, and what factors determine which implementation to choose?
What are the specific trade-offs between ElectricSQL's Postgres-based approach and Ditto's mesh networking strategy for different types of collaborative applications?
How does Apple's on-device foundation model architecture with adapters compare to the local-first database approaches described in the other documents?
What patterns emerge from comparing the security rule systems across ElectricSQL's DDLX, Firebase Security Rules, and Supabase RLS for local-first applications?
How do the dynamic partial replication strategies (Electric's Shapes vs. Replicache's blocks vs. Mongo's query-based sync) affect application design and user experience?
What are the implications of TinyBase's tabular data structure requirement versus the more flexible JSON approaches used by other local-first solutions?
How do the different approaches to logical clocks and causal consistency across these implementations affect the guarantees developers can rely on?
What business model patterns emerge from comparing Anytype's creator-owned keys approach with Ditto's enterprise edge computing focus?