notes n stuff

This commit is contained in:
Barrett Ruth 2025-01-03 11:50:49 -06:00
parent 05dd383e14
commit 04560d646c
4 changed files with 15 additions and 2 deletions

View file

@ -33,6 +33,17 @@
## [mapreduce](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)
- mapreduce: map[k0, v0] -> [k1,v1] -> reduce[k1,v[]] -> v[]
- Master-Slave assigns map/reduce tasks
- Separate M & R -> M >> R (usually) -> optimize worker allocation
- Map & reduce individually parallelized, but *not* overall
- Reducer waits for all intermediate kv pairs in order, then told by master -> this is how output is sorted
- RPC remote file read for data transfer from M -> R
- Re-execute entire M/R stage for fault tolerance
- <u>Backup Tasks</u>: dynamic performance adjustments -> 44% speedup (slow on machine -> reschedule)
- Caching & Network Topology: schedule workers close to *internal GFS chunkservers* to minimize latency
- Simplicity + abstraction - not optimal, but first of its kind and made waves
## [spark](https://people.eecs.berkeley.edu/~matei/papers/2016/cacm_apache_spark.pdf)
## [rpc](https://www.h3c.com/en/Support/Resource_Center/EN/Home/Switches/00-Public/Trending/Technology_White_Papers/gRPC_Technology_White_Paper-6W100/)

BIN
papers/kafka.pdf Normal file

Binary file not shown.

BIN
papers/zookeeper.pdf Normal file

Binary file not shown.

View file

@ -8,10 +8,12 @@
- [x] [cassandra](./notes.md#cassandra)
- [x] [bigtable](./notes.md#bigtable)
- [x] [gfs](./notes.md#gfs)
- [mapreduce](./notes.md#gfs)
- [spark](./notes.md#spark)
- [x] [mapreduce](./notes.md#gfs)
- [x] [spark](./notes.md#spark)
- rpc
- zookeeper
- kafka
- tiktok monolith
- bloom filters
- dynamo