Profile picture of Jim Pruetting

Designing Data Intensive Applications

Foundations of Data Systems

Data Models and Query Languages

Storage and Retrieval

Encoding and Evolution

Replication

Partitioning

Transactions

Distributed System Troubles

Consistency and Consensus

Batch Processing

Stream Processing

The Future of Data Systems

Key Takeaways

  1. Reliability engineering: Build systems that continue working correctly even when things go wrong
  2. Scalability planning: Design for growth in data volume, complexity, and traffic from the start
  3. Maintainability focus: Prioritize operability, simplicity, and evolvability in system design
  4. Data model selection: Choose data models based on application access patterns, not just data structure
  5. Storage engine fit: Select storage engines based on workload characteristics (write-heavy vs. read-heavy)
  6. Encoding flexibility: Use encodings that allow for independent evolution of services
  7. Replication purpose: Apply replication patterns based on specific needs for availability, latency, or scalability
  8. Partitioning strategy: Partition data to scale write throughput and balance load appropriately
  9. Consistency levels: Select consistency levels based on actual application requirements, not dogma
  10. System integration: Design thoughtful integration between systems using batch, stream processing, and derived data