feat(notes): gfs

2024-12-30 15:36:10 -06:00 · 2024-12-30 15:36:10 -06:00 · 8c3f0eb910
commit 8c3f0eb910
parent a2e450ef09
1 changed files with 25 additions and 0 deletions
--- a/notes.md
+++ b/notes.md
@ -8,6 +8,31 @@

 ## [gfs](https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf)

+- System Design as development for use case
+  - Optimized for record append and random reads
+- Master-Slave
+  - Limitations: faul tolerance despite replicas, throughput
+- Bottlenecks & network optimization
+  - Data & Control flow separation
+- State restoration & logging (lots of things I don't get here)
+  - Related: OS journaling
+- Weak consistency - "tolerable errors" (i.e. clients reading different states)
+- Garbage Collection
+
+  - Amortized cost w/ FS scans
+  - Parallels w/ language design
+
+- Terms to learn:
+
+  1. Network Bandwidth and _per-machine_ limit
+  2. Racks & data centers - how are these managed (i.e. "cross-{rack,DC} replication")?
+
+- Use the latest {soft,hard}ware or deal with slowdowns (older kernel `fsync()` requiring reading entirety of file on append)
+- Getting to know the real numbers: 440 MB/s throughput on double chunkserver kill & google network
+- Network as the ultimate bottleneck & inefficiency
+
 ## [mapreduce](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)

 ## [spark](https://people.eecs.berkeley.edu/~matei/papers/2016/cacm_apache_spark.pdf)
+
+## [rpc](https://www.h3c.com/en/Support/Resource_Center/EN/Home/Switches/00-Public/Trending/Technology_White_Papers/gRPC_Technology_White_Paper-6W100/)