Dissertations / Theses: 'Text analysis'

1

Haggren, Hugo. "Text Similarity Analysis for Test Suite Minimization." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290239.

Full text

Abstract:

Software testing is the most expensive phase in the software development life cycle. It is thus understandable why test optimization is a crucial area in the software development domain. In software testing, the gradual increase of test cases demands large portions of testing resources (budget and time). Test Suite Minimization is considered a potential approach to deal with the test suite size problem. Several test suite minimization techniques have been proposed to efficiently address the test suite size problem. Proposing a good solution for test suite minimization is a challenging task, where several parameters such as code coverage, requirement coverage, and testing cost need to be considered before removing a test case from the testing cycle. This thesis proposes and evaluates two different NLP-based approaches for similarity analysis between manual integration test cases, which can be employed for test suite minimization. One approach is based on syntactic text similarity analysis and the other is a machine learning based semantic approach. The feasibility of the proposed solutions is studied through analysis of industrial use cases at Ericsson AB in Sweden. The results show that the semantic approach barely manages to outperform the syntactic approach. While both approaches show promise, subsequent studies will have to be done to further evaluate the semantic similarity based method.
Mjukvarutestning är den mest kostsamma fasen inom mjukvaruutveckling. Därför är det förståeligt varför testoptimering är ett kritiskt område inom mjukvarubranschen. Inom mjukvarutestning ställer den gradvisa ökningen av testfall stora krav på testresurser (budget och tid). Test Suite Minimization anses vara ett potentiellt tillvägagångssätt för att hantera problemet med växande testsamlingar. Flera minimiseringsmetoder har föreslagits för att effektivt hantera testsamlingars storleksproblem. Att föreslå en bra lösning för minimering av antal testfall är en utmanande uppgift, där flera parametrar som kodtäckning, kravtäckning och testkostnad måste övervägas innan man tar bort ett testfall från testcykeln. Denna uppsats föreslår och utvärderar två olika NLP-baserade metoder för likhetsanalys mellan testfall för manuell integration, som kan användas för minimering av testsamlingar. Den ena metoden baseras på syntaktisk textlikhetsanalys, medan den andra är en maskininlärningsbaserad semantisk strategi. Genomförbarheten av de föreslagna lösningarna studeras genom analys av industriella användningsfall hos Ericsson AB i Sverige. Resultaten visar att den semantiska metoden knappt lyckas överträffa den syntaktiska metoden. Medan båda tillvägagångssätten visar lovande resultat, måste efterföljande studier göras för att ytterligare utvärdera den semantiska likhetsbaserade metoden.