Using noise to detect test flakiness

SILVA, Denini Gabriel

Please use this identifier to cite or link to this item: https://repositorio.ufpe.br/handle/123456789/44567

Share on

Title:	Using noise to detect test flakiness
Authors:	SILVA, Denini Gabriel
Keywords:	Engenharia de software e linguagens de programação; Android; Teste de software; Depuração; Evolução de software
Issue Date:	25-Feb-2022
Publisher:	Universidade Federal de Pernambuco
Citation:	SILVA, Denini Gabriel. Using noise to detect test flakiness. 2022. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2022.
Abstract:	A test is said to be flaky when it non-deterministically passes or fails in different runs on the same configuration (e.g., code). Test flakiness negatively affects regression testing as failure observations are not necessarily an indication of bugs in the program. Static and dynamic techniques for detecting flaky tests have been proposed in the literature but they are limited. Prior studies have shown that test flakiness is mostly caused by concurrent behavior. Based on that observation, we hypothesize that adding noise in the environment (stress tests consuming machine resources such as CPU and memory) can interfere in the ordering of program events and, consequently, it can influence the test outputs. We propose Shaker, a practical technique to detect flaky tests by comparing the outputs of multiple test runs in noisy environments. Compared with a regular test run, one test run with Shaker is slower as the environment is loaded, i.e., the process that runs a given test competes for resources with stressor tasks that Shaker creates. However, we conjecture that Shaker pays off by detecting flakiness in fewer runs compared with the alternative of running the test suite multiple times in a regular (non-noisy) environment. We evaluated Shaker using a public benchmark of flaky tests, obtaining encouraging results. For example, we found that (1) Shaker is 96% precise; it is almost as precise as ReRun, which by definition does not report false positives, that (2) Shaker’s recall is much higher compared to ReRun’s (95% versus 65%), and that (3) Shaker detects flaky tests much more efficiently than ReRun, despite the execution overhead associated with noise introduction. To sum up, results indicate that noise is a promising approach to detect flakiness.
URI:	https://repositorio.ufpe.br/handle/123456789/44567
Appears in Collections:	Dissertações de Mestrado - Ciência da Computação

Files in This Item:

File	Description	Size	Format
DISSERTAÇÃO Denini Gabriel Silva.pdf		1,09 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show full item record Recommend this item

This item is licensed under a Creative Commons License