Hello Sergei, Elena and all,
Today while working on the script, I found and fixed an issue:
There is some faulty code code in my script that is in charge of collecting
the statistics about whether a test failure was caught or not (here
<https://github.com/pabloem/Kokiri/blob/master/basic_simulator.py#L393>). I
looked into fixing it, and then I could see another *problem*: The *recall
numbers* that I had collected previously were *too high*.
The actual recall numbers, once we consider the test failures that are *not
caught*, are disappointingly lower. I won't show you results yet, since I
want to make sure that the code has been fixed, and I have accurate tests
first.
This is all for now. The strategy that I was using is a lot less effective
than it seemed initially. I will send out a more detailed report with
results, my opinion on the weak points of the strategy, and ideas,
including a roadmap to try to improve results.
Regards. All feedback is welcome.