Matches in SemOpenAlex for { <https://semopenalex.org/work/W2964279097> ?p ?o ?g. }
- W2964279097 endingPage "1586" @default.
- W2964279097 startingPage "1559" @default.
- W2964279097 abstract "Just-in-Time (JIT) defect prediction-a technique which aims to predict bugs at change level-has been paid more attention. JIT defect prediction leverages the SZZ approach to identify bug-introducing changes. Recently, researchers found that the performance of SZZ (including its variants) is impacted by a large amount of noise. SZZ may considerably mislabel changes that are used to train a JIT defect prediction model, and thus impact the prediction accuracy. In this paper, we investigate the impact of the mislabeled changes by different SZZ variants on the performance and interpretation of JIT defect prediction models. We analyze four SZZ variants (i.e., B-SZZ, AG-SZZ, MA-SZZ, and RA-SZZ) that are proposed by prior studies. We build the prediction models using the labeled data by these four SZZ variants. Among the four SZZ variants, RA-SZZ is least likely to generate mislabeled changes, and we construct the testing set by using RA-SZZ. All of the four prediction models are then evaluated on the same testing set. We choose the prediction model built on the labeled data by RA-SZZ as the baseline model, and we compare the performance and metric importance of the models trained using the labeled data by the other three SZZ variants with the baseline model. Through a large-scale empirical study on a total of 126,526 changes from ten Apache open source projects, we find that in terms of various performance measures (AUC, F1-score, G-mean and Recall@20%), the mislabeled changes by B-SZZ and MA-SZZ are not likely to cause a considerable performance reduction, while the mislabeled changes by AG-SZZ cause a statistically significant performance reduction with an average difference of 1-5 percent. When considering developers' inspection effort (measured by LOC) in practice, the changes mislabeled B-SZZ and AG-SZZ lead to 9-10 and 1-15 percent more wasted inspection effort, respectively. And the mislabeled changes by B-SZZ lead to significantly more wasted effort. The mislabeled changes by MA-SZZ do not cause considerably more wasted effort. We also find that the top-most important metric for identifying bug-introducing changes (i.e., number of files modified in a change) is robust to the mislabeling noise generated by SZZ. But the second- and third-most important metrics are more likely to be impacted by the mislabeling noise, unless random forest is used as the underlying classifier." @default.
- W2964279097 created "2019-07-30" @default.
- W2964279097 creator A5006669765 @default.
- W2964279097 creator A5010426195 @default.
- W2964279097 creator A5043400199 @default.
- W2964279097 creator A5052196896 @default.
- W2964279097 creator A5081036622 @default.
- W2964279097 creator A5091586373 @default.
- W2964279097 date "2021-08-01" @default.
- W2964279097 modified "2023-10-16" @default.
- W2964279097 title "The Impact of Mislabeled Changes by SZZ on Just-in-Time Defect Prediction" @default.
- W2964279097 cites W1555168845 @default.
- W2964279097 cites W1570437003 @default.
- W2964279097 cites W1655956671 @default.
- W2964279097 cites W1902482618 @default.
- W2964279097 cites W1968745662 @default.
- W2964279097 cites W1972978214 @default.
- W2964279097 cites W1978859404 @default.
- W2964279097 cites W1987843766 @default.
- W2964279097 cites W1987855178 @default.
- W2964279097 cites W1989354793 @default.
- W2964279097 cites W1994248747 @default.
- W2964279097 cites W1995945562 @default.
- W2964279097 cites W2000679946 @default.
- W2964279097 cites W2006407062 @default.
- W2964279097 cites W2007705030 @default.
- W2964279097 cites W2010398592 @default.
- W2964279097 cites W2019348938 @default.
- W2964279097 cites W2050496630 @default.
- W2964279097 cites W2073649165 @default.
- W2964279097 cites W2074805796 @default.
- W2964279097 cites W2093897789 @default.
- W2964279097 cites W2096451472 @default.
- W2964279097 cites W2099593355 @default.
- W2964279097 cites W2100310618 @default.
- W2964279097 cites W2104329051 @default.
- W2964279097 cites W2105672266 @default.
- W2964279097 cites W2105776892 @default.
- W2964279097 cites W2110229593 @default.
- W2964279097 cites W2110653426 @default.
- W2964279097 cites W2115105080 @default.
- W2964279097 cites W2120703352 @default.
- W2964279097 cites W2126166995 @default.
- W2964279097 cites W2129164226 @default.
- W2964279097 cites W2132887549 @default.
- W2964279097 cites W2135268264 @default.
- W2964279097 cites W2140785063 @default.
- W2964279097 cites W2143637886 @default.
- W2964279097 cites W2146335723 @default.
- W2964279097 cites W2147386665 @default.
- W2964279097 cites W2149783794 @default.
- W2964279097 cites W2150874999 @default.
- W2964279097 cites W2151666086 @default.
- W2964279097 cites W2158744032 @default.
- W2964279097 cites W2172232422 @default.
- W2964279097 cites W2276400542 @default.
- W2964279097 cites W2312398278 @default.
- W2964279097 cites W2330210193 @default.
- W2964279097 cites W2408181256 @default.
- W2964279097 cites W2474835145 @default.
- W2964279097 cites W2530824252 @default.
- W2964279097 cites W2534933448 @default.
- W2964279097 cites W2548915941 @default.
- W2964279097 cites W2599212561 @default.
- W2964279097 cites W2604794021 @default.
- W2964279097 cites W2605547445 @default.
- W2964279097 cites W2606150376 @default.
- W2964279097 cites W2729440153 @default.
- W2964279097 cites W2767894374 @default.
- W2964279097 cites W2796283679 @default.
- W2964279097 cites W2805001156 @default.
- W2964279097 cites W2808113972 @default.
- W2964279097 cites W2887004133 @default.
- W2964279097 cites W2911964244 @default.
- W2964279097 cites W2963520355 @default.
- W2964279097 cites W2963548617 @default.
- W2964279097 cites W3105203384 @default.
- W2964279097 cites W3175417087 @default.
- W2964279097 cites W4213251304 @default.
- W2964279097 cites W4236586490 @default.
- W2964279097 cites W4237979974 @default.
- W2964279097 cites W4252684946 @default.
- W2964279097 doi "https://doi.org/10.1109/tse.2019.2929761" @default.
- W2964279097 hasPublicationYear "2021" @default.
- W2964279097 type Work @default.
- W2964279097 sameAs 2964279097 @default.
- W2964279097 citedByCount "41" @default.
- W2964279097 countsByYear W29642790972019 @default.
- W2964279097 countsByYear W29642790972020 @default.
- W2964279097 countsByYear W29642790972021 @default.
- W2964279097 countsByYear W29642790972022 @default.
- W2964279097 countsByYear W29642790972023 @default.
- W2964279097 crossrefType "journal-article" @default.
- W2964279097 hasAuthorship W2964279097A5006669765 @default.
- W2964279097 hasAuthorship W2964279097A5010426195 @default.
- W2964279097 hasAuthorship W2964279097A5043400199 @default.
- W2964279097 hasAuthorship W2964279097A5052196896 @default.
- W2964279097 hasAuthorship W2964279097A5081036622 @default.