whitepapers/notes.md
2024-12-30 15:39:07 -06:00

38 lines
1.7 KiB
Markdown

# notes
## [profiling a warehouse-scale computer](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44271.pdf)
## [cassandra](https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf)
## [bigtable](https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf)
## [gfs](https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf)
- System Design as development for use case
- Optimized for record append and random reads
- Master-Slave
- Limitations: faul tolerance despite replicas, throughput
- Bottlenecks & network optimization
- Data & Control flow separation
- State restoration & logging (lots of things I don't get here)
- Related: OS journaling
- Weak consistency - "tolerable errors" (i.e. clients reading different states)
- Garbage Collection
- Amortized cost w/ FS scans
- Parallels w/ language design
- Terms to learn:
1. Network Bandwidth and _per-machine_ limit
2. Racks & data centers - how are these managed (i.e. "cross-{rack,DC} replication")?
- Use the latest {soft,hard}ware or deal with slowdowns (older kernel `fsync()` requiring reading entirety of file on append)
- Getting to know the real numbers: 440 MB/s throughput on double chunkserver kill & google network
- Network as the ultimate bottleneck & inefficiency
## [mapreduce](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)
## [spark](https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf)
## [rpc](https://www.h3c.com/en/Support/Resource_Center/EN/Home/Switches/00-Public/Trending/Technology_White_Papers/gRPC_Technology_White_Paper-6W100/)