Friendica Social Network

CF Bolz-Tereick

3 mesi fa • •

CF Bolz-Tereick
3 mesi fa • •

Another really cool talk from OOPSLA: "An Empirical Evaluation of Property-Based Testing in Python" youtube.com/live/zoE2w2hueYQ?s…

They collect a corpus of Python projects that use Hypothesis for Property-Based Testing. Then they check how good those tests (compared to the project's regular unit tests) are at finding random artificial bugs in the source code of the project.

Screenshot from the talk slides:

"Are PBTs better than Unit tests?
How effective are PBTs at catching bugs compared to unit tests?
Contribution: An evaluation of effectiveness
Shows PBTs are ~50x better at catching bugs than unit tests"

[ICFP/SPLASH'25] Orchid Plenary Ballroom - SPLASH OOPSLA (Oct 17th)

Full program: https://conf.researchr.org/program/icfp-splash-2025/program-icfp-splash-2025/

^YouTube

in reply to CF Bolz-Tereick

Daphne Preston-Kendal

in reply to CF Bolz-Tereick • 3 mesi fa • •

Interesting. Although I’m not convinced by the methodology – mutation testing is great at generating changes which don’t actually create bugs, many mutations it generates are functionally equivalent to one another, etc. Will have to read their actual paper to see if/how they controlled for this kind of potential inflation of the number of ‘bugs’ found by hypothesis

in reply to Daphne Preston-Kendal

CF Bolz-Tereick

in reply to Daphne Preston-Kendal • 3 mesi fa • •

@dpk I agree! but I still think that the comparison with the mutations caught by non-randomized "regular" unit-tests is meaningful

@Daphne Preston-Kendal

in reply to CF Bolz-Tereick

CF Bolz-Tereick

in reply to CF Bolz-Tereick • 3 mesi fa • •

(the paper is here: dl.acm.org/doi/10.1145/3764068 )

in reply to CF Bolz-Tereick

John Regehr

in reply to CF Bolz-Tereick • 3 mesi fa • •

cc @DRMacIver

@David R. MacIver

in reply to CF Bolz-Tereick

Paul Zuradzki

in reply to CF Bolz-Tereick • 3 mesi fa • •

very interesting! The test categorization definitions remind me of this blog post via @ScottWlaschin

fsharpforfunandprofit.com/post…

Choosing properties for property-based testing

Or, I want to use PBT, but I can never think of any properties to use

^{fsharpforfunandprofit.com}

@Scott Wlaschin

in reply to CF Bolz-Tereick

David R. MacIver

in reply to CF Bolz-Tereick • 3 mesi fa • •

As keen as I am on property-based testing, I think the 50x result is fake (even separately from critiques of mutation testing).

1. They're comparing projects with both property-based tests and unit tests, so the unit tests are often ones written assuming the property-based tests do the heavy lifting.
2. They say 55% of properties catch a bug on the first example. That means there's a unit test 55% as effective as that PBT.

in reply to David R. MacIver

CF Bolz-Tereick

in reply to David R. MacIver • 3 mesi fa • •

@DRMacIver hm, is 2. really a counter-argument? it's unlikely that human unit-test authors would have picked the right good unit test after all. also, does 55% really match your gut feeling? I am sure that I have written a lot of hypothesis tests that are much better at bug finding than any amount of standard unit tests I could ever have written.

I find the argument 1. more convincing. of course it's quite hard to avoid this problem.

@David R. MacIver

in reply to CF Bolz-Tereick

David R. MacIver

in reply to CF Bolz-Tereick • 3 mesi fa • •

I think it's plausible to me that 55% of property based tests in the wild could be replaced with a unit test without much loss.

I think it's hard to really come up with a single number capturing the effectiveness of property based tests. I wouldn't be shocked at there being a reasonable experiment that really shows 50x on metric, I just don't think this one is it.

Questo sito web utilizza cookie tecnici e di sessione. Proseguendo la navigazione su questo sito, accetti l'utilizzo dei cookie.

⇧