whitepapers/notes.md
2024-12-30 15:39:07 -06:00

1.7 KiB

notes

profiling a warehouse-scale computer

cassandra

bigtable

gfs

  • System Design as development for use case

    • Optimized for record append and random reads
  • Master-Slave

    • Limitations: faul tolerance despite replicas, throughput
  • Bottlenecks & network optimization

    • Data & Control flow separation
  • State restoration & logging (lots of things I don't get here)

    • Related: OS journaling
  • Weak consistency - "tolerable errors" (i.e. clients reading different states)

  • Garbage Collection

    • Amortized cost w/ FS scans
    • Parallels w/ language design
  • Terms to learn:

    1. Network Bandwidth and per-machine limit
    2. Racks & data centers - how are these managed (i.e. "cross-{rack,DC} replication")?
  • Use the latest {soft,hard}ware or deal with slowdowns (older kernel fsync() requiring reading entirety of file on append)

  • Getting to know the real numbers: 440 MB/s throughput on double chunkserver kill & google network

  • Network as the ultimate bottleneck & inefficiency

mapreduce

spark