Thèses : « Mining Source Code Repositories »

1

Kagdi, Huzefa H. "Mining Software Repositories to Support Software Evolution." Kent State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=kent1216149768.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

2

Bengtsson, Jonathan, and Heidi Hokka. "Analysing Lambda Usage in the C++ Open Source Community." Thesis, Mittuniversitetet, Institutionen för data- och systemvetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-39514.

Texte intégral

Résumé :

Object-oriented languages have made a shift towards incorporating functional concepts such as lambdas. Lambdas are anonymous functions that can be used within the scope of other functions. In C++ lambdas are considered difficult to use for inexperienced developers. This implies that there may be problems with lambdas in C++. However, studies about lambdas in C++ repositories are scarce, compared to other object-oriented languages such as Java. This study aims to address a knowledge gap regarding how lambdas are used by developers in C++ repositories. Furthermore, examine how developer experience and software engineering practices, such as unit testing and in-code documentation, correlates with the inclusion of lambdas. To achieve this we create a set of tools that statically analyse repositories to gather results. This study gained insight into the number of repositories utilising lambdas, their usage areas, and documentation but also how these findings compare to similar studies’ results in Java. Further, it is shown that unit testing and developer experience correlates with the usage of lambdas. Objektorienterade språk har gjort en förskjutning mot att integrera funktionella begrepp som lambdas. Lambdas är anonyma funktioner som kan användas inom ramen för andra funktioner. I C ++ anses lambdas vara svåra att använda för oerfarna utvecklare. Detta innebär att det kan vara problem med lambdas i C ++. Emellertid är studier på lambdas i C ++ repositorier mindre vanliga jämfört med andra objektorienterade språk som Java. Denna studie syftar till att ta itu med ett kunskapsgap beträffande hur lambdas används av utvecklare i C++ repositorier. Dessutom undersöks hur utvecklarvanor och sedvänjor i programvaruutveckling, till exempel enhetstestning och dokumentation, korrelerar med inkluderingen av lambdas. För att uppnå detta skapar vi en uppsättning verktyg som statiskt analyserar repositorier för att samla resultat. Denna studie fick inblick i antalet repositorier som använder lambdas, deras användningsområden och dokumentation men också hur dessa resultat jämför sig med liknande studieresultat i Java. Vidare har det visats att enhetstestning och utvecklaren erfarenhet korrelerar med användningen av lambdas.

Styles APA, Harvard, Vancouver, ISO, etc.

3

Schulte, Lukas. "Investigating topic modeling techniques for historical feature location." Thesis, Karlstads universitet, Institutionen för matematik och datavetenskap (from 2013), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-85379.

Texte intégral

Résumé :

Software maintenance and the understanding of where in the source code features are implemented are two strongly coupled tasks that make up a large portion of the effort spent on developing applications. The concept of feature location investigated in this thesis can serve as a supporting factor in those tasks as it facilitates the automation of otherwise manual searches for source code artifacts. Challenges in this subject area include the aggregation and composition of a training corpus from historical codebase data for models as well as the integration and optimization of qualified topic modeling techniques. Building up on previous research, this thesis provides a comparison of two different techniques and introduces a toolkit that can be used to reproduce and extend on the results discussed. Specifically, in this thesis a changeset-based approach to feature location is pursued and applied to a large open-source Java project. The project is used to optimize and evaluate the performance of Latent Dirichlet Allocation models and Pachinko Allocation models, as well as to compare the accuracy of the two models with each other. As discussed at the end of the thesis, the results do not indicate a clear favorite between the models. Instead, the outcome of the comparison depends on the metric and viewpoint from which it is assessed.

Styles APA, Harvard, Vancouver, ISO, etc.

4

Vu, Duc Ly. "Towards Understanding and Securing the OSS Supply Chain." Doctoral thesis, Università degli studi di Trento, 2022. http://hdl.handle.net/11572/333508.

Texte intégral

Résumé :

Free and Open-Source Software (FOSS) has become an integral part of the software supply chain in the past decade. Various entities (automated tools and humans) are involved at different stages of the software supply chain. Some actions that occur in the chain may result in vulnerabilities or malicious code injected in a published artifact distributed in a package repository. At the end of the software supply chain, developers or end-users may consume the resulting artifacts altered in transit, including benign and malicious injection. This dissertation starts from the first link in the software supply chain, ‘developers’. Since many developers do not update their vulnerable software libraries, thus exposing the user of their code to security risks. To understand how they choose, manage and update the libraries, packages, and other Open-Source Software (OSS) that become the building blocks of companies’ completed products consumed by end-users, twenty-five semi-structured interviews were conducted with developers of both large and small-medium enterprises in nine countries. All interviews were transcribed, coded, and analyzed according to applied thematic analysis. Although there are many observations about developers’ attitudes on selecting dependencies for their projects, additional quantitative work is needed to validate whether behavior matches or whether there is a gap. Therefore, we provide an extensive empirical analysis of twelve quality and popularity factors that should explain the corresponding popularity (adoption) of PyPI packages was conducted using our tool called py2src. At the end of the software supply chain, software libraries (or packages) are usually downloaded directly from the package registries via package dependency management systems under the comfortable assumption that no discrepancies are introduced in the last mile between the source code and their respective packages. However, such discrepancies might be introduced by manual or automated build tools (e.g., metadata, Python bytecode files) or for evil purposes (malicious code injects). To identify differences between the published Python packages in PyPI and the source code stored on Github, we developed a new approach called LastPyMile . Our approach has been shown to be promising to integrate within the current package dependency management systems or company workflow for vetting packages at a minimal cost. With the ever-increasing numbers of software bugs and security vulnerabilities, the burden of secure software supply chain management on developers and project owners increases. Although automated program repair approaches promise to reduce the burden of bug-fixing tasks by suggesting likely correct patches for software bugs, little is known about the practical aspects of using APR tools, such as how long one should wait for a tool to generate a bug fix. To provide a realistic evaluation of five state-of-the-art APR tools, 221 bugs from 44 open-source Java projects were run within a reasonable developers’ time and effort.

Styles APA, Harvard, Vancouver, ISO, etc.

5

Carlsson, Emil. "Mining Git Repositories : An introduction to repository mining." Thesis, Linnéuniversitetet, Institutionen för datavetenskap (DV), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-27742.

Texte intégral

Résumé :

When performing an analysis of the evolution of software quality and software metrics,there is a need to get access to as many versions of the source code as possible. There isa lack of research on how data or source code can be extracted from the source controlmanagement system Git. This thesis explores different possibilities to resolve thisproblem. Lately, there has been a boom in usage of the version control system Git. Githubalone hosts about 6,100,000 projects. Some well known projects and organizations thatuse Git are Linux, WordPress, and Facebook. Even with these figures and clients, thereare very few tools able to perform data extraction from Git repositories. A pre-studyshowed that there is a lack of standardization on how to share mining results, and themethods used to obtain them. There are several tools available for older version control systems, such as concurrentversions system (CVS), but few for Git. The examined repository mining applicationsfor Git are either poorly documented; or were built to be very purpose-specific to theproject for which they were designed. This thesis compiles a list of general issues encountered when using repositorymining as a tool for data gathering. A selection of existing repository mining tools wereevaluated towards a set of prerequisite criteria. The end result of this evaluation is thecreation of a new repository mining tool called Doris. This tool also includes a smallcode metrics analysis library to show how it can be extended.

Styles APA, Harvard, Vancouver, ISO, etc.

6

Sinha, Vinayak. "Sentiment Analysis On Java Source Code In Large Software Repositories." Youngstown State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1464880227.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

7

Ribeiro, Athos Coimbra. "Ranking source code static analysis warnings for continuous monitoring of free/libre/open source software repositories." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-20082018-170140/.

Texte intégral

Résumé :

While there is a wide variety of both open source and proprietary source code static analyzers available in the market, each of them usually performs better in a small set of problems, making it hard to choose one single tool to rely on when examining a program. Combining the analysis of different tools may reduce the number of false negatives, but yields a corresponding increase in the number of false positives (which is already high for many tools). An interesting solution, then, is to filter these results to identify the issues least likely to be false positives. This work presents kiskadee, a system to support the usage of static analysis during software development by providing carefully ranked static analysis reports. First, it runs multiple static analyzers on the source code. Then, using a classification model, the potential bugs detected by the static analyzers are ranked based on their importance, with critical flaws ranked first, and potential false positives ranked last. To train kiskadee\'s classification model, we post-analyze the reports generated by three tools on synthetic test cases provided by the US National Institute of Standards and Technology. To make our technique as general as possible, we limit our data to the reports themselves, excluding other information such as change histories or code metrics. The features extracted from these reports are used to train a set of decision trees using AdaBoost to create a stronger classifier, achieving 0.8 classification accuracy (the combined false positive rate from the used tools was 0.61). Finally, we use this classifier to rank static analyzer alarms based on the probability of a given alarm being an actual bug. Our experimental results show that, on average, when inspecting warnings ranked by kiskadee, one hits 5.2 times less false positives before each bug than when using a randomly sorted warning list. Embora exista grande variedade de analisadores estáticos de código-fonte disponíveis no mercado, tanto com licenças proprietárias, quanto com licenças livres, cada uma dessas ferramentas mostra melhor desempenho em um pequeno conjunto de problemas distinto, dificultando a escolha de uma única ferramenta de análise estática para analisar um programa. A combinação das análises de diferentes ferramentas pode reduzir o número de falsos negativos, mas gera um aumento no número de falsos positivos (que já é alto para muitas dessas ferramentas). Uma solução interessante é filtrar esses resultados para identificar os problemas com menores probabilidades de serem falsos positivos. Este trabalho apresenta kiskadee, um sistema para promover o uso da análise estática de código fonte durante o ciclo de desenvolvimento de software provendo relatórios de análise estática ranqueados. Primeiramente, kiskadee roda diversos analisadores estáticos no código-fonte. Em seguida, utilizando um modelo de classificação, os potenciais bugs detectados pelos analisadores estáticos são ranqueados conforme sua importância, onde defeitos críticos são colocados no topo de uma lista, e potenciais falsos positivos, ao fim da mesma lista. Para treinar o modelo de classificação do kiskadee, realizamos uma pós-análise nos relatórios gerados por três analisadores estáticos ao analisarem casos de teste sintéticos disponibilizados pelo National Institute of Standards and Technology (NIST) dos Estados Unidos. Para tornar a técnica apresentada o mais genérica possível, limitamos nossos dados às informações contidas nos relatórios de análise estática das três ferramentas, não utilizando outras informações, como históricos de mudança ou métricas extraídas do código-fonte dos programas inspecionados. As características extraídas desses relatórios foram utilizadas para treinar um conjunto de árvores de decisão utilizando o algoritmo AdaBoost para gerar um classificador mais forte, atingindo uma acurácia de classificação de 0,8 (a taxa de falsos positivos das ferramentas utilizadas foi de 0,61, quando combinadas). Finalmente, utilizamos esse classificador para ranquear os alarmes dos analisadores estáticos nos baseando na probabilidade de um dado alarme ser de fato um bug no código-fonte. Resultados experimentais mostram que, em média, quando inspecionando alarmes ranqueados pelo kiskadee, encontram-se 5,2 vezes menos falsos positivos antes de se encontrar cada bug quando a mesma inspeção é realizada para uma lista ordenada de forma aleatória.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Hassan, Ahmed. "Mining Software Repositories to Assist Developers and Support Managers." Thesis, University of Waterloo, 2004. http://hdl.handle.net/10012/1017.

Texte intégral

Résumé :

This thesis explores mining the evolutionary history of a software system to support software developers and managers in their endeavors to build and maintain complex software systems. We introduce the idea of evolutionary extractors which are specialized extractors that can recover the history of software projects from software repositories, such as source control systems. The challenges faced in building C-REX, an evolutionary extractor for the C programming language, are discussed. We examine the use of source control systems in industry and the quality of the recovered C-REX data through a survey of several software practitioners. Using the data recovered by C-REX, we develop several approaches and techniques to assist developers and managers in their activities. We propose Source Sticky Notes to assist developers in understanding legacy software systems by attaching historical information to the dependency graph. We present the Development Replay approach to estimate the benefits of adopting new software maintenance tools by reenacting the development history. We propose the Top Ten List which assists managers in allocating testing resources to the subsystems that are most susceptible to have faults. To assist managers in improving the quality of their projects, we present a complexity metric which quantifies the complexity of the changes to the code instead of quantifying the complexity of the source code itself. All presented approaches are validated empirically using data from several large open source systems. The presented work highlights the benefits of transforming software repositories from static record keeping repositories to active repositories used by researchers to gain empirically based understanding of software development, and by software practitioners to predict, plan and understand various aspects of their project.

Styles APA, Harvard, Vancouver, ISO, etc.

9

Thummalapenta, Suresh. "Improving Software Productivity and Quality via Mining Source Code." NORTH CAROLINA STATE UNIVERSITY, 2011. http://pqdtopen.proquest.com/#viewpdf?dispub=3442531.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

10

Delorey, Daniel Pierce. "Observational Studies of Software Engineering Using Data from Software Repositories." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd1716.pdf.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

11

Palomba, Fabio. "Code smells: relevance of the problem and novel detection techniques." Doctoral thesis, Universita degli studi di Salerno, 2017. http://hdl.handle.net/10556/2566.

Texte intégral

Résumé :

2015 - 2016 Software systems are becoming the core of the business of several industrial companies and, for this reason, they are getting bigger and more complex. Furthermore, they are subject of frantic modifications every day with regard to the implementation of new features or for bug fixing activities. In this context, often developers have not the possibility to design and implement ideal solutions, leading to the introduction of technical debt, i.e., “not quite right code which we postpone making it right”. One noticeable symptom of technical debt is represented by the bad code smells, which were defined by Fowler to indicate sub-optimal design choices applied in the source code by developers. In the recent past, several studies have demonstrated the negative impact of code smells on the maintainability of the source code, as well as on the ability of developers to comprehend a software system. This is the reason why several automatic techniques and tools aimed at discovering portions of code affected by design flaws have been devised. Most of them rely on the analysis of the structural properties (e.g., method calls) mined from the source code. Despite the effort spent by the research community in recent years, there are still limitations that threat the industrial applicability of tools for detecting code smells. Specifically, there is a lack of evicence regarding (i) the circustamces leading to code smell introduction, (ii) the real impact of code smells on maintainability, since previous studies focused the attention on a limited number of software projects. Moreover, existing code smell detectors might be inadeguate for the detection of many code smells defined in literature. For instance, a number xi of code smells are intrinsically characterized by how code elements change over time, rather than by structural properties extractable from the source code. In the context of this thesis we face these specific challenges, by proposing a number of large-scale empirical investigations aimed at understanding (i) when and why smells are actually introduced, (ii) what is their longevity and the way developers remove them in practice, (iii) what is the impact of code smells on change- and fault-proneness, and (iv) how developers perceive code smells. At the same time, we devise two novel approaches for code smell detection that rely on alternative sources of information, i.e., historical and textual, and we evaluate and compare their ability in detecting code smells with respect to other existing baseline approaches solely relying structural analysis. The findings reported in this thesis somehow contradicts common expectations. In the first place, we demonstrate that code smells are usually introduced during the first commit on the repository involving a source file, and therefore they are not the result of frequent modifications during the history of source code. More importantly, almost 80% of the smells survive during the evolution, and the number of refactoring operations performed on them is dramatically low. Of these, only a small percentage actually removed a code smell. At the same time, we also found that code smells have a negative impact on maintainability, and in particular on both change- and fault-proneness of classes. In the second place, we demonstrate that developers can correctly perceive only a subset of code smells characterized by long or complex code, while the perception of other smells depend on the intensity with which they manifest themselves. Furthermore, we also demonstrate the usefulness of historical and textual analysis as a way to improve existing detectors using orthogonal informations. The usage of these alternative sources of information help developers in correctly diagnose design problems and, therefore, they should be actively exploited in future research in the field. Finally, we provide a set of open issues that need to be addressed by the research community in the future, as well as an overview of further future applications of code smells in other software engineering field. [edited by author] I sistemi software stanno diventando il cuore delle attivit`a di molte aziende e, per questa ragione, sono sempre pi `u grandi e complessi. Inoltre, sono frequentemente soggetti a modifiche che riguardano l’implementazione di nuove funzionalit`a o la risoluzione di difetti. In questo contesto, spesso gli sviluppatori non hanno la possibilit`a di progettare ed implementare soluzioni ideali, introducendo quindi technical debt, ovvero codice non ben progettato, la cui ri-progettazione viene postposta nel futuro. Un notevole sintomo della presenza di technical debt `e rappresentato dai bad code smell, che sono stati definiti da Fowler per indicare scelte di progettazione e/o implementazione sub-ottimali applicati dai programmatori durante lo sviluppo di un progetto software. Nel recente passato, molti studi hanno dimostrato l’impatto negativo dei code smell sulla manutenibilit`a e comprensibilit`a del codice sorgente. Per questa ragione, molte tecniche sono state proposte per l’identificazione di porzioni di codice affetto da problemi di progettazione. Molte di queste tecniche si basano sull’analisi delle propriet strutturali (ad esempio, chiamate a metodi esterni) estraibili dal codice sorgente. Nonostante lo sforzo che la comunit`a di ricerca ha profuso negli anni recenti, ci sono ancora limitazioni che precludono l’applicabilit`a industriale di tool per l’identificazione di smell. Nello specifico, c’`e una mancanza di evidenza empirica riguardo (i) le circostanze che portano all’introduzione degli smell, e (ii) il reale impatto degli smell sulla manutenibilit`a, in quanto studi precedenti hanno focalizzato l’attenzione su un numero limitato di progetti software. Inoltre, le tecii niche esistenti per l’identificazione di smell sono inadeguate per quanto concerne l’identificazione di molti dei code smell definiti in letteratura. Ad esempio, molti code smell sono intrinsecamente caratterizzati da come gli elementi nel codice cambiano nel tempo, piuttosto che da propriet`a strutturali estraibili dal codice sorgente. Nel contesto di questa tesi abbiamo affrontato queste sfide specifiche, proponendo diversi studi empirici su larga scala aventi come obiettivo quello di capire (i) quando e perch`e i code smell sono realmente introdotti, (ii) qual `e la loro longevit`a e come gli sviluppatori li rimuovono, (iii) qual `e l’impatto degli smell sulla propensione ai difetti e ai cambiamenti, e (iv) come gli sviluppatori percepiscono gli smell. Allo stesso tempo, abbiamo proposto due nuovi approcci per la rilevazione di code smell che si basano sull’utilizzo di sorgenti alternative di informazioni, ovvero storiche e testuali, e abbiamo valutato e confrontato la loro capacit`a nella identificazione rispetto alle altre tecniche basate su analisi strutturale. I risultati riportati in questa tesi contraddicono le aspettative comuni. Ad esempio, abbiamo dimostrato che i code smell sono spesso introdotti durante il primo commit che introduce l’artefatto affetto dal problema di progettazione. Dall’altro lato, abbiamo dimostrato l’utilit`a dell’analisi storica e testuale come un modo aggiuntivo per migliorare tecniche esistenti con informazioni ortogonali. Inoltre, forniamo un insieme di problemi aperti che necessitato di ulteriore attenzione in futuro, cos`ı come una panoramica di ulteriori applicazioni future dei code smell in altri contesti nel campo dell’ingegneria del software. [a cura dell'autore] XV n.s.

Styles APA, Harvard, Vancouver, ISO, etc.

12

Hung, Che Shian. "Boa Views: Enabling Modularization and Sharing of Boa Queries." Bowling Green State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1563557234898944.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

13

Sereesathien, Siriwan. "A GitHub-based Voice Assistant for Software Developers and Teams." DigitalCommons@CalPoly, 2021. https://digitalcommons.calpoly.edu/theses/2334.

Texte intégral

Résumé :

Software developers and teams typically rely on source code and tasks management tools for their projects. They tend to depend on different platforms such as GitHub, Azure DevOps, Bitbucket, and GitLab for task-tracking, feature-tracking, and bug-tracking to develop and maintain their software repositories. Individually, developers may lose concentration when having to navigate through numerous screens consisting of various platforms to perform daily tasks. Additionally, while in meetings (non-virtual), teams are often separate from their machines and often would have to rely on pure recollection of the tasks and issues related to their work. This can delay the decision-making process and take away valuable focus hours of developers. Although there is usually one person with their laptop to guide the meeting and has access to the source code management tools, this can take a lot of time as they are not familiar with all the developers’ independent works. Therefore, a new tool needs to be introduced to help accelerate individual and team meetings’ productivity. In this paper, we continued the work on Robin, a voice-assistant built to answer questions regarding GitHub issues and source code management. Robin has the ability to answer questions in addition to completing actions on the behalf of the developer. This thesis presents Robin's abilities, architecture, and implementation while also examining its usability through a user study. Our study suggests that some people love the idea of having a conversational agent for software development. However, a lot more research and iterations must be done to fully make Robin give the user experience we imagined. In this thesis, we were able to set the foundation of this idea and the lessons that we learned.

Styles APA, Harvard, Vancouver, ISO, etc.

14

Dondero, Robert Michael Jr Hislop Gregory W. "Predicting software change coupling /." Philadelphia, Pa. : Drexel University, 2008. http://hdl.handle.net/1860/2759.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

15

Torres, José Jorge Barreto. "Identificação e análise de clones de códigos heterogêneos em um ambiente corporativo de desenvolvimento de software." Universidade Federal de Sergipe, 2016. https://ri.ufs.br/handle/riufs/3392.

Texte intégral

Résumé :

The demand for speeding up software development inside corporations triggers a series of issues related to coding organization. Software development teams have to achieve business deadlines, so they adopt the bad practice to copy-and-paste code. In this way, clones populate software repositories and hinder the improvement or maintenance of systems. Programming languages with object-oriented paradigm characteristics tend to make easy coding abstraction and reuse processes. However, a question arises: the same team working with several kinds of programming languages are influenced by their paradigms regarding the decrease of cloning incidence? This work proposed an approach to identify, analyze and compare clones inside heterogeneous software repositories without consider the development team profile. The experimental evaluation of the approach was possible thru two controlled experiments which aimed to detect and evaluate clones, using and adapting tools available on market. This evaluation was executed inside an organizational environment, which owned several applications with closed-source code but available to analysis. The final results showed no relationship to the amount of application code lines. Procedural language systems had a lower clone incidence and, when conflicting open and closed source systems, both had similar results regarding to the manifestation of source-code clones. A exigência por acelerar o desenvolvimento de software nas empresas desencadeia uma série de problemas relacionados à organização do código. As equipes de desenvolvimento, pressionadas a cumprir prazos ditados pela área de negócio, adotam a prática ruim de copiar e colar código. Assim, os clones são criados e povoam os repositórios de software dessas companhias, tornando o aprimoramento e manutenção dos sistemas cada vez mais dificultado. Linguagens de programação que possuem características do paradigma de orientação a objetos tendem a facilitar ainda mais o processo de abstração de código e de reaproveitamento. No entanto, uma questão pode ser feita: uma mesma equipe, trabalhando com diversos tipos de linguagens, sofre influência destes tipos, no que diz respeito à diminuição da incidência de clones? Este trabalho propôs uma abordagem para identificar, analisar e comparar clones em repositórios heterogêneos de software, com uma análise tênue do perfil da equipe envolvida. A avaliação experimental da abordagem foi realizada por meio de dois experimentos controlados, os quais visaram a detecção e a avaliação de clones, utilizando e adaptando o ferramental disponível no mercado. Esta avaliação foi executada in-vivo, em um ambiente organizacional real, o qual possuía uma grande quantidade de aplicações e linhas de código fechado disponíveis para análise. Os resultados finais não apresentaram relação direta com a quantidade de linhas de código das aplicações. Sistemas de linguagem procedural apresentaram menor incidência de clones e, no conflito entre sistemas de código aberto e fechado, ambos tiveram resultados similares no que diz respeito à manifestação de clones de código-fonte.

Styles APA, Harvard, Vancouver, ISO, etc.

16

Moura, Marcello Henrique Dias de. "Comparação entre desenvolvedores de software a partir de dados obtidos em repositório de controle de versão." Universidade Federal de Goiás, 2013. http://repositorio.bc.ufg.br/tede/handle/tede/7944.

Texte intégral

Résumé :

Submitted by Erika Demachki (erikademachki@gmail.com) on 2017-11-06T19:48:59Z No. of bitstreams: 2 Dissertação - Marcello Henrique Dias de Moura - 2013.pdf: 3325482 bytes, checksum: 45be62e46fd5fda90d1d0561482a3d85 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Approved for entry into archive by Erika Demachki (erikademachki@gmail.com) on 2017-11-06T19:49:14Z (GMT) No. of bitstreams: 2 Dissertação - Marcello Henrique Dias de Moura - 2013.pdf: 3325482 bytes, checksum: 45be62e46fd5fda90d1d0561482a3d85 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Made available in DSpace on 2017-11-06T19:49:14Z (GMT). No. of bitstreams: 2 Dissertação - Marcello Henrique Dias de Moura - 2013.pdf: 3325482 bytes, checksum: 45be62e46fd5fda90d1d0561482a3d85 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2013-03-22 Version Control Systems are repositories that store source code changes done by software developers. Research that extracts data from these repositories for analysis can be classified into two groups: those that focus on the development process and the ones that focus on the developers. The present dissertation investigates the second case and contributes to the field by providing: (a) the definition of a history file that summarizes changes made to software in line and file levels, (b) a set of metrics to evaluate the work of the developers; and (c) two approaches for comparing the developers based on their metrics. A computational system that implements these metrics and approaches was built and applied to two case studies of real software development projects. The results obtained in the studies were positive. They were consistent with the general perception of project managers about the work done by the developers. They also leaded to new ideas for improving the research.We believe that these contributions are a step towards a better understanding and characterization of the way about how software developers work. Repositórios de Controle de Versão são sistemas que armazenam mudanças no código fonte realizadas por desenvolvedores de software. As pesquisas que extraem dados desses repositórios para análise podem ser classificadas em dois grupos: as que focam no processo de desenvolvimento e as que focam no desenvolvedor. O presente trabalho investiga o segundo aspecto contribuindo para o assunto com: (a) a definição de um histórico de arquivos que sumariza as mudanças realizadas no software em nível de linha e de arquivo; (b) um conjunto de métricas visando avaliar o trabalho dos desenvolvedores; e (c) duas propostas de abordagem para comparar os desenvolvedores. Um sistema computacional que implementa essas métricas e as abordagens foi construído, tendo sido aplicado em dois estudos de casos envolvendo projetos reais de desenvolvimento de software. Os resultados obtidos nos estudos foram positivos, coincidindo, em geral, com a percepção de gerentes de projetos sobre o trabalho dos desenvolvedores e apontando para novas ideias de evolução da pesquisa. Consideramos que este é um passo no sentido de entender e caracterizar melhor a forma de trabalho dos desenvolvedores.

Styles APA, Harvard, Vancouver, ISO, etc.

17

Fejzer, Mikołaj. "Mining software repositories for code quality." Doctoral thesis, 2021. https://depotuw.ceon.pl/handle/item/3908.

Texte intégral

Résumé :

This dissertation covers a series of code quality topics utilizing mining of software repositories.We focus on code review support and software bugs detection. The code review is proof-reading of proposed code change, accepted as an industry standard. The reviewers find common shortcomings such as lacking test coverage, misused design patterns, or logic errors and provide feedback to the author of changes. Quality of review depends on correct selection of reviewers. Most software is shipped with various kinds of defects. Fixing those defects is one of most common activities in software development. Bug localization is a process of finding specific defects in project source code, based on user supplied reports. First, we analyze open source projects to gather insights on contributor activities and bug prevalence, with the goal of helping bug detection during code review. We use topic modeling on change comments to elect core developers to be involved in the code review. Additionally, we examine both commit comments and issue topics per each month to assess the popularity of bug fixing. Consequently, due to the fact that over half of all comments are related to bugs, we focus on improving code review by introducing change classifier. The classifier indicates potentially buggy changes during code review. Second, we present a new method of recommending code reviewers, utilizing profiles of individual contributors. For each developer we maintain a corresponding profile, based on a multiset of all file path segments from commits reviewed by him/her. The profile is updated after participation in the new review. We employ a similarity function between such profiles and change proposals to be reviewed. The contributor whose profile is the most similar becomes the recommended reviewer. We performed an experimental comparison of our method against state-of-the-art techniques using four large open-source projects. We obtained improved results in terms of classification metrics (precision, recall and F-measure) and performance (we have lower time and space complexity). Third, we propose adaptive method to localize bugs based on bug reports. Upon receiving a new bug report, developers need to find its cause in the source code. Bug localization can be supported by a tool that ranks all source files according to how likely they include the bug. Consequently, we introduce new feature weighting approaches and an adaptive selection algorithm. We evaluate localization method on publicly available datasets, with competitive results and performance compared to state–of–the–art. Niniejsza rozprawa obejmuje szereg zagadnień związanych z jakością kodu bazując na eksploracji repozytoriów. Koncentrujemy się na wsparciu inspekcji kodu i wykrywaniu błędów programistycznych. Inspekcja kodu jest techniką standardowo stosowaną w przemyśle. Polega na badaniu zaproponowanych zmian w kodzie źródłowym przez innych programistów. Recenzenci znajdują typowe niedociągnięcia i przekazują informacje zwrotne autorowi zmian. Jakość recenzji zależy od właściwego doboru recenzentów. Większość oprogramowania jest dostarczana z różnego rodzaju defektami. Naprawa tych wad jest jednym z najczęstszych działań w tworzeniu oprogramowania. Lokalizacja błędów to proces znajdowania określonych defektów w kodzie źródłowym projektu na podstawie raportów dostarczonych przez użytkownika. Pierwszym rozważanym zagadnieniem jest analiza projektów open source, aby zaobserwować zachowanie kontrybutorów i występowanie błędów. Używamy modelowania tematów na treści komentarzy do zestawów zmian, żeby znaleźć głównych programistów dla każdego badanego projektu. Ponadto badamy tematy zgłoszeń użytkowników jak i zestawów zmian, aby ocenić popularność naprawiania błędów w skali miesięcy. Na podstawie powyższych analiz skupiamy się na ulepszeniu inspekcji kodu poprzez dodanie klasyfikatora zmian, ponieważ ponad połowa wszystkich komentarzy dotyczy błędów. Klasyfikator wskazuje potencjalnie błędne zmiany podczas inspekcji. Jako drugie poruszane zagadnienie przedstawiamy nową metodę rekomendacji recenzentów kodu wykorzystującą profile poszczególnych kontrybutorów. Dla każdego programisty utrzymujemy odpowiedni profil oparty o wielozbiór wszystkich segmentów ścieżek recenzowanych dotychczas plików. Profil jest aktualizowany po przygotowaniu nowej recenzji. Stosujemy funkcję podobieństwa między takimi profilami i propozycjami zmian wymagających inspekcji. Kontrybutor, którego profil jest najbardziej podobny, zostaje rekomendowanym recenzentem. Przeprowadziliśmy eksperymenty w celu porównania naszej metody z najnowocześniejszymi technikami, wykorzystując cztery duże projekty open source. Uzyskaliśmy lepsze wyniki pod względem miar jakości klasyfikacji oraz wydajności. Trzecim zagadnieniem jest nowa, adaptacyjna metoda lokalizacji błędów na podstawie zgłaszanych raportów. Po otrzymaniu nowego raportu programiści muszą znaleźć przyczynę błędu w kodzie źródłowym. Proponujemy narzędzie wspierające lokalizację błędów za pomocą określenia prawdopodobieństwa zawierania usterki przez pliki w projekcie. Nasza metoda bazuje na nowych sposobach ważenia cech i adaptacyjnych algorytmach selekcji. Uzyskaliśmy konkurencyjne wyniki i wydajność w porównaniu do najnowocześniejszych technik na publicznie dostępnych zestawach danych.

Styles APA, Harvard, Vancouver, ISO, etc.

18

Dhaliwal, Tejinder. "TECHNIQUES FOR IMPROVING SOFTWARE DEVELOPMENT PROCESSES BY MINING SOFTWARE REPOSITORIES." Thesis, 2012. http://hdl.handle.net/1974/7438.

Texte intégral

Résumé :

Software repositories such as source code repositories and bug repositories record information about the software development process. By analyzing the rich data available in software repositories, we can uncover interesting information. This information can be leveraged to guide software developers, or to automate software development activities. In this thesis we investigate two activities of the development process: selective code integration and grouping of field crash-reports, and use the information available in software repositories to improve each of the two activities. Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2012-09-04 12:26:59.388

Styles APA, Harvard, Vancouver, ISO, etc.

19

Ying, Annie Tsui Tsui. "Predicting source code changes by mining revision history." Thesis, 2003. http://hdl.handle.net/2429/14491.

Texte intégral

Résumé :

Software developers are often faced with modification tasks that involve source which is spread across a code base. Some dependencies between source, such as the dependencies between platform dependent fragments, cannot be determined by existing static and dynamic analyses. To help developers identify relevant source code during a modification task, we have developed an approach that applies data mining techniques to determine change patterns—files that were changed together frequently in the past—from the revision history of the code base. Our hypothesis is that the change patterns can be used to recommend potentially relevant source code to a developer performing a modification task. We show that this approach can reveal valuable dependencies by applying the approach to the Eclipse and Mozilla open source projects, and by evaluating the predictability and interestingness of the recommendations produced.

Styles APA, Harvard, Vancouver, ISO, etc.

20

"Supporting Source Code Feature Analysis Using Execution Trace Mining." Thesis, 2013. http://hdl.handle.net/10388/ETD-2013-10-1266.

Texte intégral

Résumé :

Software maintenance is a significant phase of a software life-cycle. Once a system is developed the main focus shifts to maintenance to keep the system up to date. A system may be changed for various reasons such as fulfilling customer requirements, fixing bugs or optimizing existing code. Code needs to be studied and understood before any modification is done to it. Understanding code is a time intensive and often complicated part of software maintenance that is supported by documentation and various tools such as profilers, debuggers and source code analysis techniques. However, most of the tools fail to assist in locating the portions of the code that implement the functionality the software developer is focusing. Mining execution traces can help developers identify parts of the source code specific to the functionality of interest and at the same time help them understand the behaviour of the code. We propose a use-driven hybrid framework of static and dynamic analyses to mine and manage execution traces to support software developers in understanding how the system's functionality is implemented through feature analysis. We express a system's use as a set of tests. In our approach, we develop a set of uses that represents how a system is used or how a user uses some specific functionality. Each use set describes a user's interaction with the system. To manage large and complex traces we organize them by system use and segment them by user interface events. The segmented traces are also clustered based on internal and external method types. The clusters are further categorized into groups based on application programming interfaces and active clones. To further support comprehension we propose a taxonomy of metrics which are used to quantify the trace. To validate the framework we built a tool called TrAM that implements trace mining and provides visualization features. It can quantify the trace method information, mine similar code fragments called active clones, cluster methods based on types, categorise them based on groups and quantify their behavioural aspects using a set of metrics. The tool also lets the users visualize the design and implementation of a system using images, filtering, grouping, event and system use, and present them with values calculated using trace, group, clone and method metrics. We also conducted a case study on five different subject systems using the tool to determine the dynamic properties of the source code clones at runtime and answer three research questions using our findings. We compared our tool with trace mining tools and profilers in terms of features, and scenarios. Finally, we evaluated TrAM by conducting a user study on its effectiveness, usability and information management.

Styles APA, Harvard, Vancouver, ISO, etc.

21

Acharya, Mithun Puthige. "Mining API specifications from source code for improving software reliability." 2009. http://www.lib.ncsu.edu/theses/available/etd-03222009-100211/unrestricted/etd.pdf.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

22

Li, Shu-ya, and 李書雅. "Using Data Mining Techniques to Analyze Students'' Source-Code Data." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/18637007302327758690.

Texte intégral

Résumé :

碩士 雲林科技大學 資訊管理系碩士班 97 In traditional programming language courses, the relationship of the teacher and students are one-to-many. A teacher knows students’ learning outcomes from their source codes of homework or testing, but both of the sources are difficult for the teacher to observe the students’ real learning situation. Although an Internet teaching system is able to record the learning information of students, but the teacher still has difficulty to understand the learning characteristics of students due to the large amount of raw data, and can’t visually observe the characteristics of source codes to analyze student learning outcomes. It is also difficult to know whether the programming test or homework occurs plagiarism. The purpose of this study is to explore how to analyze the characteristics of students’ programming source codes. We propose a framework for analyzing programming source code data by using data mining techniques, including decision tree, hierarchical clustering, self-organizing map, and feature selection methods. We also use two different points of views which include source codes features and program structure to help teachers to identify implicit information in the learning process by using data mining techniques.

Styles APA, Harvard, Vancouver, ISO, etc.

23

"In Pursuit of Optimal Workflow Within The Apache Software Foundation." Doctoral diss., 2017. http://hdl.handle.net/2286/R.I.44309.

Texte intégral

Résumé :

abstract: The following is a case study composed of three workflow investigations at the open source software development (OSSD) based Apache Software Foundation (Apache). I start with an examination of the workload inequality within the Apache, particularly with regard to requirements writing. I established that the stronger a participant's experience indicators are, the more likely they are to propose a requirement that is not a defect and the more likely the requirement is eventually implemented. Requirements at Apache are divided into work tickets (tickets). In our second investigation, I reported many insights into the distribution patterns of these tickets. The participants that create the tickets often had the best track records for determining who should participate in that ticket. Tickets that were at one point volunteered for (self-assigned) had a lower incident of neglect but in some cases were also associated with severe delay. When a participant claims a ticket but postpones the work involved, these tickets exist without a solution for five to ten times as long, depending on the circumstances. I make recommendations that may reduce the incidence of tickets that are claimed but not implemented in a timely manner. After giving an in-depth explanation of how I obtained this data set through web crawlers, I describe the pattern mining platform I developed to make my data mining efforts highly scalable and repeatable. Lastly, I used process mining techniques to show that workflow patterns vary greatly within teams at Apache. I investigated a variety of process choices and how they might be influencing the outcomes of OSSD projects. I report a moderately negative association between how often a team updates the specifics of a requirement and how often requirements are completed. I also verified that the prevalence of volunteerism indicators is positively associated with work completion but what was surprising is that this correlation is stronger if I exclude the very large projects. I suggest the largest projects at Apache may benefit from some level of traditional delegation in addition to the phenomenon of volunteerism that OSSD is normally associated with. Dissertation/Thesis Doctoral Dissertation Industrial Engineering 2017

Styles APA, Harvard, Vancouver, ISO, etc.

24

Lin, Hung-Ru, and 林鴻儒. "Applying Data Mining to Source Code Fault Prediction: An Example of Consumer Electronics Projects." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/79037030441481370393.

Texte intégral

Résumé :

碩士 國立交通大學 管理學院碩士在職專班管理科學組 99 In recent years, as a result of the rise in demand for value-added software and the life cycle reduction of consumer electronics, software managers need to prioritize resources more accurately to achieve performance goals. Previous study shows that source code fault prediction can provide information for allocating resources accordingly. In this thesis, we implement fault prediction system by using several data mining algorithms, the experimental results show that cross-company training set gives a practical result for consumer electronics projects which lack of enough with-in company historical data. Logistic regression algorithm is relatively better, 84.6% of the defects can be detected by inspecting 19.1% of the code, which reaches approximately 77.4% cost-benefit. The mining models indicate that LOC metrics is more informative than Halstead and McCabe. Our study suggests that Naive Bayes algorithm is suitable for projects with both computing resources and failure cost considerations. Neural network and clustering algorithm are adaptive in the industry with higher R&D cost. In addition, software managers must pay attention to the modularization issue for derivative projects, otherwise, it will affect the precision of the prediction system.

Styles APA, Harvard, Vancouver, ISO, etc.

25

Baysal, Olga. "Supporting Development Decisions with Software Analytics." Thesis, 2014. http://hdl.handle.net/10012/8380.

Texte intégral

Résumé :

Software practitioners make technical and business decisions based on the understanding they have of their software systems. This understanding is grounded in their own experiences, but can be augmented by studying various kinds of development artifacts, including source code, bug reports, version control meta-data, test cases, usage logs, etc. Unfortunately, the information contained in these artifacts is typically not organized in the way that is immediately useful to developers’ everyday decision making needs. To handle the large volumes of data, many practitioners and researchers have turned to analytics — that is, the use of analysis, data, and systematic reasoning for making decisions. The thesis of this dissertation is that by employing software analytics to various development tasks and activities, we can provide software practitioners better insights into their processes, systems, products, and users, to help them make more informed data-driven decisions. While quantitative analytics can help project managers understand the big picture of their systems, plan for its future, and monitor trends, qualitative analytics can enable developers to perform their daily tasks and activities more quickly by helping them better manage high volumes of information. To support this thesis, we provide three different examples of employing software analytics. First, we show how analysis of real-world usage data can be used to assess user dynamic behaviour and adoption trends of a software system by revealing valuable information on how software systems are used in practice. Second, we have created a lifecycle model that synthesizes knowledge from software development artifacts, such as reported issues, source code, discussions, community contributions, etc. Lifecycle models capture the dynamic nature of how various development artifacts change over time in an annotated graphical form that can be easily understood and communicated. We demonstrate how lifecycle models can be generated and present industrial case studies where we apply these models to assess the code review process of three different projects. Third, we present a developer-centric approach to issue tracking that aims to reduce information overload and improve developers’ situational awareness. Our approach is motivated by a grounded theory study of developer interviews, which suggests that customized views of a project’s repositories that are tailored to developer-specific tasks can help developers better track their progress and understand the surrounding technical context of their working environments. We have created a model of the kinds of information elements that developers feel are essential in completing their daily tasks, and from this model we have developed a prototype tool organized around developer-specific customized dashboards. The results of these three studies show that software analytics can inform evidence-based decisions related to user adoption of a software project, code review processes, and improved developers’ awareness on their daily tasks and activities.

Styles APA, Harvard, Vancouver, ISO, etc.

26

Cao, Yaxin. "Investigating the Impact of Personal, Temporal and Participation Factors on Code Review Quality." Thèse, 2015. http://hdl.handle.net/1866/12570.

Texte intégral

Résumé :

La révision du code est un procédé essentiel quelque soit la maturité d'un projet; elle cherche à évaluer la contribution apportée par le code soumis par les développeurs. En principe, la révision du code améliore la qualité des changements de code (patches) avant qu'ils ne soient validés dans le repertoire maître du projet. En pratique, l'exécution de ce procédé n'exclu pas la possibilité que certains bugs passent inaperçus. Dans ce document, nous présentons une étude empirique enquétant la révision du code d'un grand projet open source. Nous investissons les relations entre les inspections des reviewers et les facteurs, sur les plans personnel et temporel, qui pourraient affecter la qualité de telles inspections.Premiérement, nous relatons une étude quantitative dans laquelle nous utilisons l'algorithme SSZ pour détecter les modifications et les changements de code favorisant la création de bogues (bug-inducing changes) que nous avons lié avec l'information contenue dans les révisions de code (code review information) extraites du systéme de traçage des erreurs (issue tracking system). Nous avons découvert que les raisons pour lesquelles les réviseurs manquent certains bogues était corrélées autant à leurs caractéristiques personnelles qu'aux propriétés techniques des corrections en cours de revue. Ensuite, nous relatons une étude qualitative invitant les développeurs de chez Mozilla à nous donner leur opinion concernant les attributs favorables à la bonne formulation d'une révision de code. Les résultats de notre sondage suggèrent que les développeurs considèrent les aspects techniques (taille de la correction, nombre de chunks et de modules) autant que les caractéristiques personnelles (l'expérience et review queue) comme des facteurs influant fortement la qualité des revues de code. Code review is an essential element of any mature software development project; it aims at evaluating code contributions submitted by developers. In principle, code review should improve the quality of code changes (patches) before they are committed to the project's master repository. In practice, the execution of this process can allow bugs to get in unnoticed. In this thesis, we present an empirical study investigating code review of a large open source project. We explore the relationship between reviewers'code inspections and personal, temporal and participation factors that might affect the quality of such inspections. We first report a quantitative study in which we applied the SZZ algorithm to detect bug-inducing changes that were then linked to the code review information extracted from the issue tracking system. We found that the reasons why reviewers miss bugs are related to both their personal characteristics, as well as the technical properties of the patches under review. We then report a qualitative study that aims at soliciting opinions from Mozilla developers on their perception of the attributes associated with a well-done code review. The results of our survey suggest that developers find both technical (patch size, number of chunks, and module) and personal factors (reviewer's experience and review queue) to be strong contributors to the review quality.

Styles APA, Harvard, Vancouver, ISO, etc.

Thèses sur le sujet « Mining Source Code Repositories »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres