Another really cool talk from OOPSLA: "An Empirical Evaluation of Property-Based Testing in Python" youtube.com/live/zoE2w2hueYQ?s…
They collect a corpus of Python projects that use Hypothesis for Property-Based Testing. Then they check how good those tests (compared to the project's regular unit tests) are at finding random artificial bugs in the source code of the project.
[ICFP/SPLASH'25] Orchid Plenary Ballroom - SPLASH OOPSLA (Oct 17th)
Full program: https://conf.researchr.org/program/icfp-splash-2025/program-icfp-splash-2025/YouTube
Daphne Preston-Kendal
in reply to CF Bolz-Tereick • • •CF Bolz-Tereick
in reply to Daphne Preston-Kendal • • •CF Bolz-Tereick
in reply to CF Bolz-Tereick • • •John Regehr
in reply to CF Bolz-Tereick • • •Paul Zuradzki
in reply to CF Bolz-Tereick • • •very interesting! The test categorization definitions remind me of this blog post via @ScottWlaschin
fsharpforfunandprofit.com/post…
Choosing properties for property-based testing
fsharpforfunandprofit.comDavid R. MacIver
in reply to CF Bolz-Tereick • • •As keen as I am on property-based testing, I think the 50x result is fake (even separately from critiques of mutation testing).
1. They're comparing projects with both property-based tests and unit tests, so the unit tests are often ones written assuming the property-based tests do the heavy lifting.
2. They say 55% of properties catch a bug on the first example. That means there's a unit test 55% as effective as that PBT.
CF Bolz-Tereick
in reply to David R. MacIver • • •@DRMacIver hm, is 2. really a counter-argument? it's unlikely that human unit-test authors would have picked the right good unit test after all. also, does 55% really match your gut feeling? I am sure that I have written a lot of hypothesis tests that are much better at bug finding than any amount of standard unit tests I could ever have written.
I find the argument 1. more convincing. of course it's quite hard to avoid this problem.
David R. MacIver
in reply to CF Bolz-Tereick • • •I think it's plausible to me that 55% of property based tests in the wild could be replaced with a unit test without much loss.
I think it's hard to really come up with a single number capturing the effectiveness of property based tests. I wouldn't be shocked at there being a reasonable experiment that really shows 50x on metric, I just don't think this one is it.