From 8c3f0eb9105f9f04bf70033fb1e07b01c81b1ebb Mon Sep 17 00:00:00 2001 From: Barrett Ruth Date: Mon, 30 Dec 2024 15:36:10 -0600 Subject: [PATCH] feat(notes): gfs --- notes.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/notes.md b/notes.md index 412384f..126c0b2 100644 --- a/notes.md +++ b/notes.md @@ -8,6 +8,31 @@ ## [gfs](https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf) +- System Design as development for use case + - Optimized for record append and random reads +- Master-Slave + - Limitations: faul tolerance despite replicas, throughput +- Bottlenecks & network optimization + - Data & Control flow separation +- State restoration & logging (lots of things I don't get here) + - Related: OS journaling +- Weak consistency - "tolerable errors" (i.e. clients reading different states) +- Garbage Collection + + - Amortized cost w/ FS scans + - Parallels w/ language design + +- Terms to learn: + + 1. Network Bandwidth and _per-machine_ limit + 2. Racks & data centers - how are these managed (i.e. "cross-{rack,DC} replication")? + +- Use the latest {soft,hard}ware or deal with slowdowns (older kernel `fsync()` requiring reading entirety of file on append) +- Getting to know the real numbers: 440 MB/s throughput on double chunkserver kill & google network +- Network as the ultimate bottleneck & inefficiency + ## [mapreduce](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf) ## [spark](https://people.eecs.berkeley.edu/~matei/papers/2016/cacm_apache_spark.pdf) + +## [rpc](https://www.h3c.com/en/Support/Resource_Center/EN/Home/Switches/00-Public/Trending/Technology_White_Papers/gRPC_Technology_White_Paper-6W100/)