1.7 KiB
1.7 KiB
notes
profiling a warehouse-scale computer
cassandra
bigtable
gfs
-
System Design as development for use case
- Optimized for record append and random reads
-
Master-Slave
- Limitations: faul tolerance despite replicas, throughput
-
Bottlenecks & network optimization
- Data & Control flow separation
-
State restoration & logging (lots of things I don't get here)
- Related: OS journaling
-
Weak consistency - "tolerable errors" (i.e. clients reading different states)
-
Garbage Collection
- Amortized cost w/ FS scans
- Parallels w/ language design
-
Terms to learn:
- Network Bandwidth and per-machine limit
- Racks & data centers - how are these managed (i.e. "cross-{rack,DC} replication")?
-
Use the latest {soft,hard}ware or deal with slowdowns (older kernel
fsync()requiring reading entirety of file on append) -
Getting to know the real numbers: 440 MB/s throughput on double chunkserver kill & google network
-
Network as the ultimate bottleneck & inefficiency