Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The problem: not enough resource utilization.
Resources are things like CPU cycles, disk and network traffic utilization.
Running the unit tests, I use one core of my ten core machine, usually running around 30% utilization so about 5% utilization of my machine's resources (forgetting about the GPUs, which barely stir!)
Running a product like "question the docs" I see more cores active but still the machine seems mostly inactive.
It doesn't look like disk or network IO is a bottleneck either, but I don't have clear results.
If we made complete utilization of our machines' resources, we could speed up our unit tests by a factor of 2 to 20 (depending also on how many cores were in the machine), and our actual code by some lesser but quite possibly significant amount.
We should at least have some idea of where all our
How to approach the problem
1. Benchmarks
We need at least one very simple "benchmark": in other words, a simple-as-possible program that "does superduperdb" in a loop for something barely non-trivial, with a measurable throughput.
As we go on, we can create a mix of benchmarks, we should be careful not to fall into a hole full of metrics though.
For this large-scale order of magnitude stuff, even one benchmark is very useful, and we have a second benchmark for free - the tests.
2. Analysis (single-core)
Run the benchmarks, look at utilization of resources.
Run perf and make some graphs (see attached) or do analysis ourselves using
pstats
.This will show you where the wait states are (in the attached document, look at the bottom right hand corner for an example).
Theoretically, there is always a way to eliminate wait states, or at least reduce them to be very small, but sometimes it might be they will be out of our control without too much work. I am sure we can do significantly better.
3. Analysis (multi-core)
We know already that our unit tests only utilize a single core for a series of complicated reasons which we are working to rectify. That's a separate task.
More important is our production code, and for that we need to have our benchmark.
The main program has facilities for parallelizing to other cores already, so it might be that once we improve our single core utilization, we only need to tune our usage of other cores in this step.
4. How much work is it?
Almost none to get started. We already know how to make perf graphs and will productionize that.
We need to make "the benchmark" which could just be "question the docs running in a tight loop with a counter". At this point we have some basic idea of how well or badly we are doing in production
To hit for example the issue of completely parallelizing the tests is going to take considerable time more, in parallel with other work.
Beta Was this translation helpful? Give feedback.
All reactions