performance/ocw/profiling/notes.md

1.4 KiB

profiling

intro

instrumentation: overall timing (usual idea)

  • while timing a clock with time_the_clock.cc, i erroneously initialized a chrono::microseconds default. this can be seen in the STL source: the default constructor initializes the Rep (repeated value, the unit for the duration) by default. A microsecond is a long here - this became garbage.
  • NOTE: timing the clock (with 0 optimization) took ~0ns and is not of significance for my profiling. I concluded this by compiling with -O0 and de-mangling with c++filt we can see this is the case: we can see this is the case and nothing is being optimized out:
.L585:
	call	std::chrono::_V2::system_clock::now()@PLT
	movq	%rax, -64(%rbp)
	call	std::chrono::_V2::system_clock::now()@PLT

However, I acknowledge that this code is not run one-for-one. I don't have the knowledge to assess caching done by the actual CPU itself, or even other side effects like inlining.

So far, though, one lesson of profiling is:

Only profile the code actually being profiled

statistical profiling

  • more accurate by experimental virtue - probabilistic look

    • imagine the execution of the program as a "strip" of time
    • system-specific, but so was before
  • statistical: stats

  • program simulation: ll

exercise 1

  • use cachegrind/valgrind/asan on cf problem
  • apply bentley's rules to some code
  • begin developing a strategy for how to profile things