Siga este enlace para ver otros tipos de publicaciones sobre el tema: Short read and long read sequencing.

Tesis sobre el tema "Short read and long read sequencing"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 22 mejores tesis para su investigación sobre el tema "Short read and long read sequencing".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Soundiramourtty, Abirami. "Exploring the transpositional landscape and recent transposable element activity in beech trees using long read mobilome and genome sequencing and with new computational tools". Electronic Thesis or Diss., Perpignan, 2024. http://www.theses.fr/2024PERP0043.

Texto completo
Resumen
L’adaptation des organismes aux changements environnementaux est devenue une question fondamentale de la recherche, en particulier face aux impacts du réchauffement climatique. Un axe clé de recherche consiste à comprendre comment les éléments génétiques sous jacent, tels que les éléments transposables (ET). Les ET sont des séquences d'ADN répétés présentes chez tous les Eucaryotes, possédant la capacité unique de se déplacer au sein du génome, un phénomène appelé transposition active. Ainsi, ils peuvent provoquer des mutations en générant des insertions polymorphiques d'ET (TIPs) entre individus, voire des insertions somatiques. En général, les ET restent inactifs grâce à des mécanismes épigénétiques qui limitent leur prolifération incontrôlée. Cependant, ils peuvent être réactivés par divers stimuli environnementaux, rendant la transposition active relativement rare. Cette mobilité des ET peut être révélée en utilisant l'ADN circulaire extrachromosomique (ADNecc) comme marqueur de transposition. Le paysage transpostionnel des TEs et leur activité récente ont été décrits chez des organismes modèles, mais restent inexploités chez les espèces pérennes comme les arbres. Cette étude vise à explorer l’activité transpositionelle récente et la mobilité en cours des ET chez des espèces pérennes non modèles en utilisant le hêtre européen (Fagus sylvatica) comme notre modèle d’étude. Nous avons cherché à étudier l'activité récente des ET et leur mobilité continue en identifiant les variants causés par les ET au sein d'une population et chez un individu (à l'échelle somatique) en utilisant le séquençage du génome complet (WGS) et le séquençage du mobilome (ou ADNecc). Nous avons réalisé le séquençage WGS et du mobilome d'arbres de la forêt de Verzy, connue pour abriter des hêtres nains et tortillards, également appelés « mutants ». Ces arbres présentent des traits morphologiques instables, avec chez certains arbres de nouvelles branches qui se développent avec une forme normale. Nous avons identifié deux ET appartenant au type des Miniature Inverted Repeats Transposable Elements (MITEs), nommés SQUIRREL1 et SQUIRREL2, qui se mobilisent activement dans ces arbres, produisant une grande quantité dADNecc et causant même des variations somatiques. SQUIRREL1 et SQUIRREL2 sont également actifs dans les hêtres de la forêt de la Massane. De plus, dans tous ces arbres, plusieurs d’autres ET, principalement des MITEs, produisent une grande quantité dADNecc, bien que leur niveau d’activité semble varier en fonction des tissus, suggérant que l'activité des ET varie selon le stade de développement et indiquant une transposition dominée par les MITEs chez le hêtre. Parallèlement, nous avons étudié les TIPs dans une population de hêtres de la forêt de la Massane, une forêt ancienne classée au patrimoine mondial de l'UNESCO. En séquençant 150 arbres, nous avons cherché à comprendre comment les ET contribuent à la diversité génétique de l'ensemble de la population en détectant les TIPs générés par les Long Terminal Repeats rétrotransposons (LTR RT) et les MITEs en utilisant le séquençage WGS. Nous avons détecté environ 30 000 TIPs de LTR-RT chez chaque individu, contre 70 000 TIPs de MITEs. La plupart de ces TIPs restent à faible fréquence mais de nombreux MITE-TIPs restent localisés près de gènes fonctionnels et conservés au sein de la population. À partir des TIPs, nous avons identifié plusieurs points chauds de variation et des régions conservées le long du génome du hêtre permettant d’abordant la structuration du génome chez cette espèce. Pour conclure, notre étude met en lumière l’importance des ET dans la structuration du paysage génomique des arbres, en particulier dans la manière dont ces éléments contribuent à l’évolution des espèces à longue durée de vie. Les recherches futures pourraient étendre ces travaux à d’autres espèces d'arbres et explorer si les schémas observés se retrouvent dans d’autres espèces d’arbres
The adaptation of organisms to environmental changes has become a fundamental research question,particularly in the context of climate change. A key area of this research is to identify underlying genetic elements, such as transposable elements (TEs), contributing to this process. TEs are repetitive DNA sequences found across all eukaryotes, possessing the unique ability to move within the genome, a phenomenon known as active transposition. They can cause mutations by generating transposable element insertion polymorphisms (TIPs) between individuals, and even somatic insertions. Generally, TEs remain inactive by epigenetic mechanisms that limit their uncontrolled proliferation. However, they can be reactivated upon various environmental stimuli, making active transposition relatively rare. TE mobility can be detected using extrachromosomal circular DNA (eccDNA) as a marker of transposition. The transpositional landscape of TEs and their recent activity have been documented in model organisms but remain underexplored in perennial species such as trees. This study aims to investigate recent transpositional activity and ongoing mobility of TEs in non-model perennial species, using European beech (Fagus sylvatica) as our model. We sought to study recent TE activity and their continuous mobility byidentifying TE-induced variants within a population and in an individual (at the somatic scale) using whole-genome sequencing (WGS) and mobilome sequencing (eccDNA). We conducted WGS and mobilome sequencing of trees from the Verzy forest, known for its dwarf and tortuous beeches, also referred as "mutants." These trees exhibit unstable phenotypical traits, with some trees developing new normal branches. We identified two TEs belonging to the Miniature Inverted Repeat Transposable Elements (MITEs) type, named SQUIRREL1 and SQUIRREL2, which are actively mobilizing in these trees, producing large amounts of eccDNA and even causing somatic variations.SQUIRREL1 and SQUIRREL2 are also active in beech trees from the Massane forest. Furthermore, in all these trees, several other TEs,mainly MITEs, produce significant amounts of eccDNA, although their activity levels appear to vary depending on the tissues, suggesting that TE activity could be tissue-specific indicating MITE-dominated transposition in beech. Simultaneously, we investigated TIPs in a population of beech trees from the Massane forest, an ancient forest classified as a UNESCO World Heritage site. By sequencing 150 trees, we aimed to understand how TEs contribute to the genetic diversity of the entire population by detecting TIPs generated by Long Terminal Repeat retrotransposons (LTR-RTs) and MITEs using WGS. We detected approximately 30,000 LTR-RT TIPs in each individual, compared to 70,000 MITE TIPs. While most of these TIPs remain at low frequency, many MITE-TIPs are located near functional genes and more conserved within the population. Using these TIPs, we identified several hotspots of variation and conserved regions along the beech genome, providing insights into genome structure in this species. In conclusion, our study highlights the importance of TEs in shaping the genomic landscape of trees, particularly in understanding how these elements contribute to the evolution of long-lived species. Future research could expand this work to other tree species and explore whether the patterns observed in beeches are common in other types of trees
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Whiteford, Nava. "String matching in DNA sequences : implications for short read sequencing and repeat visualisation". Thesis, University of Southampton, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.438668.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Chacon, de San Baldomero Alejandro. "Read mapping on heterogeneous systems: scalability strategies for bioinformatic primitives". Doctoral thesis, Universitat Autònoma de Barcelona, 2021. http://hdl.handle.net/10803/671736.

Texto completo
Resumen
La seqüenciació genòmica és un component clau en nous avenços en medicina, i la seva democratització és un pas important per millorar l’accessibilitat per al pacient. Els beneficis implícits en el descobriment de noves variants genètiques són molt amplis, incloent des de la detecció precoç de càncer com la medicina personalitzada, passant pel disseny de fàrmacs i l’edició genòmica. Tots aquests usos potencials han incrementat exponencialment l’interès de la comunitat científica en el camp de la bioinformàtica durant els últims anys. A més, el sorgiment dels mètodes de Seqüenciació de Nova Generació ha contribuït a la reducció ràpida dels costos de seqüenciació, permetent el desenvolupament de noves aplicacions genòmiques. El principal objectiu d’aquesta tesi és el de millorar el rendiment i precisió de l’estat de l’art de la seqüenciació genètica a través de l’ús de plataformes de còmput heterogeni i sistemes de computació híbrida. Més específicament, el treball s’ha centrat en l’acceleració de el problema de mapeig de reads curts, ja que es descriu com un dels estadis del pipeline amb un major cost computacional. De forma global, s’ aspirava a reduir el temps de processament i el cost de la seqüenciació genètica, incrementant la disponibilitat d’aquest tipus d’anàlisi. La principal contribució d’aquesta tesi és la integració GPU del mapper GEM3 (GEM3-GPU). Aquest mapper reporta les mateixes dades de sortida per CPU i GPU, i és un dels primers mappers GPU que permet l’alineament de reads llargs i variables. Les propostes han estat validades utilitzant dades reals, ja que el mapper ha estat corrent en producció en un centre de seqüenciació genòmica (Centre Nacional d’Anàlisi Genòmica (CNAG)). En conjunció amb el mapper GEM3-GPU, durant aquesta tesi s’ha creat una llibreria bioinformàtica en CUDA (GEM-cutter). La llibreria aporta blocs de primitives GPU bàsiques que han estat altament optimitzades. Gem-cutter ofereix una API basada en primitives send and receive (message passing), i incorpora un scheduler per balancejar el treball. A més, la llibreria suporta totes les arquitectures GPU i Multi-GPU.
La secuenciación genómica es un componente clave en nuevos avances en medicina, y su democratización es un paso importante hacia la accesibilidad para el paciente. Los beneficios implícitos en el descubrimiento de nuevas variantes genéticas son muy amplios, incluyendo desde la detección precoz de cáncer como la medicina personalizada, pasando por el diseño de fármaco y la edición genómica. Estos usos potenciales han incrementado exponencialmente el interés de la comunidad científica en el campo de la bioinformática durante los últimos años. Además, el surgimiento de los métodos de Secuenciación de Nueva Generación ha contribuido a la reducción rápida de los costes de secuenciación, permitiendo el desarrollo de nuevas aplicaciones genómicas. El principal objetivo de esta tesis es el de mejorar el rendimiento y precisión del estado del arte de la secuenciación genética a través del uso de plataformas de computo heterogéneo y sistemas de hardware híbridos. Más específicamente, el trabajo se ha centrado en la aceleración del problema del short-read mapping, dado que se describe como uno de los estadíos del pipeline con un mayor coste computacional. De forma global, se aspiraba a reducir el tiempo de procesado y el coste de la secuenciación genética, incrementando su disponibilidad. La principal contribución de esta tesis es la integración GPU del mapper GEM3 (GEM3-GPU). Este mapper reporta los mismos datos de salida para CPU y GPU, y es uno de los primeros mappers GPU que permite el alineamiento de reads largos y variables. Las propuestas han sido validadas utilizando datos reales, dado que el mapper ha estado corriendo en producción en un centro de secuenciación (Centro Nacional de Análisis Genómico (CNAG)). En conjunción con el mapper GEM3-GPU, durante esta tesis se ha creado una librería bioinformática en CUDA (GEM-cutter). La librería provee bloques de primitivas GPU básicas que han sido altamente optimizadas. Gem-cutter ofrece una API basada en primitivas de send and receive (message passing), e incorpora un scheduler para balancear el trabajo. Además, la librería soporta todas las arquitecturas GPU y Multi-GPU.
Genomic sequencing is the key component of new advances in medicine, and its democratization is an important step in improving accessibility for the patient. The benefits involved in discovering new genomic variations are vast and include everything from early cancer detection to personalized medicine, drug design and genome editing. All of these potential uses have greatly increased the interest of the scientific community in the field of bioinformatics in recent years. Moreover, the emergence of next-generation sequencing methods has contributed to the rapid reduction of sequencing costs, enabling new applications of genomics in precision medicine. The main goal of this thesis is to improve the state of the art in performance and accuracy for genome sequencing through the use of heterogeneous computing platforms and hybrid hardware systems. More specifically, the work is focused on accelerating the problem of short-read mapping, as it is described as one of the most computationally expensive parts of the pipeline process. Overall, we aim to reduce the processing time and cost of genome sequencing, and then increasing the availability of this type analysis. The main contribution of this thesis is the full GPU integration of the GEM3 mapper (GEM3-GPU), reporting significant improvements in performance and competitive accuracy results. The mapper reports the same output files for CPU and GPU and is one of the first GPU mappers to allow very long and variable read alignment. The proposals have been validated using real data, since the mapper has been running in production at a genomic sequencing center (Centro Nacional de Análisis Genómico (CNAG)). Together with the GEM3-GPU mapper, a complete bioinformatics CUDA library (GEM-cutter) has been created. The library provides the basic building blocks for genomic applications, which are highly optimised to run on GPUs. Gem-cutter offers an API based on send and receive primitives (message passing) and incorporates a scheduler to balance the work. Furthermore, the library supports all GPU architectures and Multi-GPU execution.
Universitat Autònoma de Barcelona. Programa de Doctorat en Informàtica
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Targon, Robin. "A novel method for the production of long DNA sequences from short reads". Doctoral thesis, Università degli studi di Padova, 2015. http://hdl.handle.net/11577/3424278.

Texto completo
Resumen
Next Generation Sequencing (NGS) has deeply changed our way to study genome biology: in the last ten years an astonishing amount of evidence ranging from the transcriptome variability to the association patterns of specific proteins with DNA or RNA sequences were produced with this technology, thus opening the way to amazing discoveries and perspectives. Unfortunately, the short length of the sequencing reads produced by second generation sequencers is limiting the potential of this technology. In particular some very interesting studies have been hampered by the short read length. High-quality long reads would permit much better approaches to full-length transcripts analysis, alternative splicing, RNA editing, de novo whole genome assembly, genomic structural variations and haplotype characterization. The study that I conducted for my doctorate focused on the possibility to produce high-quality long reads using NGS technology. The first motivation behind the development of this project was to investigate full-length transcripts and in particular to verify the hypothesis whether the pattern of alternative splicing could be associated to transcription start sites. A further motivation was the application of this technology to de novo whole genome assembly. Since at the instrumental level the limits of the read length is not amendable, I addressed my efforts towards the development of a method to reconstruct the sequence of long DNA or RNA molecules by precise local assembly of short reads produced by second generation sequencers. The idea that I wanted to exploit is based on “molecular barcoding”. Typically, barcodes are short DNA sequence tags that are included in the adaptors and used for the preparation of NGS libraries. Barcodes make possible the association of each read to its corresponding library, allowing the analysis of multiple samples in the same sequencing run. In my project I used barcodes for a very different purpose. In fact, my objective was to label individual DNA or RNA molecules with univocal barcodes, to enable the identification of all the reads generated from the subfragments of each original molecule. For this purpose I used random barcodes, considering that reads with the same barcode would come from the same original DNA/RNA molecule. Therefore, in comparison to standard barcoding techniques, my approach has two main differences: firstly it is a single molecule barcoding, secondly the barcodes are made by random sequences. A considerable part of my work was dedicated to the development of reliable genetic engineering strategies to obtain mate-pair libraries constituted on one side by the barcoded end and on the other side by a random region of the original DNA or RNA molecule. Every step of the protocol was carefully optimized in order to make the method simple and at the same time robust. Several trials were performed to test the method. Although in these trials we limited the analysis to a low coverage, we found that mate pair reads sharing the same barcode were mostly mapping in clustered genomic positions, as expected. Our results, albeit preliminary, demonstrate that the method so far developed is capable to work. Although some steps of the protocol could be further optimized, the method is now applied to produce long genomic reads with high coverage. Furthermore, some adaptations are now implemented to apply the method also to transcriptome samples.
L'avvento dei sequenziatori di ultima generazione (NGS) ha profondamente cambiato il nostro approccio allo studio del genoma e dell'espressione genica: negli ultimi dieci anni è stata prodotta un'incredibile quantità di dati e di evidenze sperimentali riguardanti la complessità del trascrittoma e le interazioni tra specifiche proteine e molecole di DNA o RNA, aprendo così la strada ad entusiasmanti scoperte ed applicazioni tecnologiche. Sfortunatamente, la ridotta lunghezza delle sequenze prodotte dai sequenziatori di seconda generazione limita le potenzialità di questa tecnologia. Nello specifico, alcune interessanti applicazioni quali l'analisi degli splicing alternativi e dell'RNA-editing, l'assemblaggio di genomi ex novo, la caratterizzazione di aplotipi e l'identificazione di variazioni strutturali a livello genomico, beneficerebbero sicuramente di una tecnologia in grado di produrre lunghe sequenze ad alta qualità. Lo studio che ho condotto durante il mio dottorato di ricerca è stato finalizzato alla produzione di lunghe sequenze ad alta qualità utilizzando gli attuali sequenziatori di seconda generazione. La principale motivazione che ha guidato questo studio è stata la volontà di caratterizzare a livello di sequenza nucleotidica le diverse isoforme trascrizionali in modo da poter verificare l'ipotesi di una relazione funzionale tra l'utilizzo di specifici siti d'inizio trascrizione e lo splicing alternativo degli esoni. Un'ulteriore motivazione era rappresentata dalla possibilità di ottenere la sequenza di lunghi frammenti di DNA al fine di facilitare l'assemblaggio di genomi. Non essendo possibile intervenire sulla lunghezza delle sequenze prodotte dai sequenziatori di seconda generazione, ho sviluppato una strategia che permette di ottenere lunghe sequenze nucleotidiche mediante un preciso assemblaggio di sequenze corte derivanti da una singola molecola. Questa strategia si basa sul concetto di “barcoding” molecolare. Un “barcode”, letteralmente “codice a barre”, è un corto frammento di DNA a sequenza nucleotidica nota che viene aggiunto a tutte le molecole di uno specifico campione. In questo modo è possibile sequenziare diversi campioni simultaneamente e associare ogni sequenza al proprio campione di provenienza semplicemente leggendo il “barcode” ad essa associato. Nel mio progetto lo scopo e la natura dei “barcode” è differente: i “barcode” utilizzati hanno sequenza casuale, in moda da poter marcare ogni singola molecola del campione con una sequenza univoca. La presenza di un “barcode” univoco permette l'assegnazione delle sequenze prodotte alla molecola di origine e, quindi, il loro corretto assemblaggio. Una parte considerevole di questo lavoro è stata dedicata allo sviluppo di strategie di ingegneria genetica che permettessero la costruzione di librerie “mate pair” in cui parte della sequenza fosse costituita dal “barcode”, mentre l'altra parte rappresentasse una porzione casuale della molecola di DNA o RNA di origine. Ogni singolo passaggio del protocollo è stato ottimizzato al fine di rendere il metodo più semplice e robusto. Diverse prove di sequenziamento sono state effettuate per poter valutare l'efficienza della metodica; sebbene l'analisi di queste prove sia stata condizionata dal basso “coverage” di sequenziamento, abbiamo dimostrato come le sequenze “mate pair” che condividono lo stesso “barcode” si allineino, come atteso, a livello della stessa posizione genomica. I risultati ottenuti, sebbene siano preliminari, dimostrano che il metodo sviluppato funziona. Nonostante alcuni passaggi del protocollo richiedano un'ulteriore ottimizzazione, il metodo verrà a breve impiegato per la produzione di lunghe sequenze genomiche aumentando il “coverage” di sequenziamento. Nel prossimo futuro l'introduzione di alcune modifiche minori al protocollo permetterà di estendere il suo utilizzo all'analisi di trascrittomi.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Long, Evan Michael. "Genomic Structural Variation Across Five Continental Populations of Drosophila melanogaster". BYU ScholarsArchive, 2018. https://scholarsarchive.byu.edu/etd/7335.

Texto completo
Resumen
Chromosomal structure variations (SV) including insertions, deletions, inversions, and translocations occur within the genome and can have a significant effect on organismalphenotype. Some of these effects are caused by structural variations containing genes. Modern sequencing using short reads makes the detection of large structural variations (> 1kb) very difficult. Large structural variations represent a significant amount of the genetic diversity within a population. We used a global sampling of Drosophila melanogaster (Ithaca, Zimbabwe, Beijing, Tasmania, and Netherlands) to represent diverse populations. We used long-read sequencing and optical mapping technologies to identify SVs in these genomes. Because the average read length used for these approaches are much longer than traditional short read sequencing, these maps facilitate the identification of chromosomal SVs of greater size and with more clarity. We found a wide diversity of structural variations in each of the five strains. These structural variations varied greatly in size and location, and significantly affected exonic regions of the genome. Structural variations accounted for a much larger difference in number of base pairs between strains than single nucleotide polymorphisms (SNPs).
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Fuente, Lorente Lorena de la. "Development of a bioinformatics approach for the functional analysis of alternative splicing". Doctoral thesis, Universitat Politècnica de València, 2019. http://hdl.handle.net/10251/124974.

Texto completo
Resumen
[ES] Uno de los aspectos más apasionantes de la transcripción es la plasticidad transcriptómica y proteómica mediada por los procesos de regulación post-transcripcional (PTR). Los mecanismos PTR como el splicing alternativo (AS) y la poliadenilación alternativa (APA) han emergido como procesos estrechamente regulados que juegan un papel clave en la generación de la complejidad transcriptómica y están asociados con la coordinación de la diferenciación celular o el desarrollo de tejidos. Sin embargo nuestro conocimiento sobre cómo estos mecanismos regulan las propiedades de los productos resultantes para definir el fenotipo es aún muy reducido. La cantidad de variantes existentes y el amplio rango de posibles consecuencias funcionales, hacen su validación funcional una tarea impracticable si se realiza caso por caso. Además, la falta de herramientas para la evaluación funcional orientada a isoformas ha provocado que gran parte del trabajo computacional haya empleado pipelines ad-hoc aplicadas a sistemas biológicos específicos o simplemente hayan confiado en análisis de enriquecimiento GO, los cuales no son informativos del impacto en las propiedades de las isoformas que hay detrás de la regulación PTR. De hecho, a pesar de las más de sesenta mil publicaciones relativas al AS, muy pocas isoformas se han asociado con propiedades específicas, mientras que el número de nuevas variantes AS/APA con function desconocida crece exponencialmente debido a las técnicas de secuenciación de segunda generación (NGS). Además, y debido a limitaciones técnicas de las NGS para reconstruir la estructura de los transcritos, las tecnologías de secuenciación de tercera generación (TGS) están definiendo una nueva era en la que, por primera vez, es posible conocer la secuencia de elementos estructurales y funcionales en los mRNAs. En esta tesis se han abordado tres propósitos principales para poder avanzar en el estudio funcional de las isoformas. En primer lugar, con las TGS siendo cada vez más utilizadas, la evaluación de la calidad de los transcriptomas \textit{de novo} es esencial para asegurar la fiabilidad de la diversidad transcriptómica encontrada. La falta de análisis de calidad orientados a secuencias largas ha motivado el desarrollo de SQANTI, una pipeline automatizado para la exhaustiva evaluación de TGS transcriptomas. En segundo lugar, la información a nivel de gen de la mayoría de bases de datos funcionales sigue siendo el principal escollo para el estudio de la variabilidad entre isoformas, especialmente en el caso de las isoformas nuevas, en las que las bases de datos estáticas impiden su caracterización. Así, hemos diseñado IsoAnnot, que construye una base de datos de anotaciones funcionales con resolución a nivel de isoformas integrando información diseminada por múltiples bases de datos y métodos de predicción. Finalmente, la indisponibilidad de métodos para estudiar el impacto funcional de la regulación de isoformas, nos ha motivado a desarrollar tappAS, una herramienta dinámica, flexible y diseñada para facilitar el abordaje de este tipo de estudios. Por lo tanto, durante esta tesis hemos desarrollado una infraestructura que resuelve los retos principales del análisis funcional de isoformas, proporcionando un conjunto de nuevos métodos y herramientas que ofrecen una oportunidad única para explorar cómo el fenotipo se especifica post-transcripcionalmente, mediante la alteración de las propiedades funcionales de las isoformas expresadas. La aplicación de nuestro análisis a un doble sistema de diferenciación neuronal en ratón definió el efecto de la regulación de isoformas entre la diferenciación de motoneuronas y oligodendrocitos para múltiples elementos funcionales. Entre ellos, hemos descubierto regiones transmembrana que son diferencialmente incluidas en las isoformas expresadas entre ambos tipos celulares y cuya regulación podría estar contribuyendo al control de
[CAT] Un dels aspectes més emocionants de la biologia del transcriptoma és l'adaptabilitat contextual de transcriptomes i proteomes eucariotes mitjançant la regulació post-transcripcional (PTR). Els mecanismes PTR, com el splicing alternatiu (AS) i la poliadenilació alternativa (APA), s'han convertit en processos molt regulats que juguen un paper clau en la generació de la complexitat del transcriptoma i en la coordinació de la diferenciació cel·lular o del desenvolupament de teixits. No obstant això, el nostre coneixement de com aquests mecanismes imprimeixen característiques funcionals diferents al conjunt resultant d'isoformes per definir el fenotip observat és encara escàs. El nombre de variants de PTR i les seues conseqüències potencialment funcionals fa que la validació funcional sigui una tasca poc pràctica si es fa cas per cas. A més, la manca d'enfocaments funcionals orientats a isoformes ha fet que gran part del treballs computacionals per esbrinar qüestions funcionals a nivell de transcriptoma siguen estratègies computacionals ad hoc aplicades a sistemes biològics específics o bé basats en un simple anàlisi d'enriquiment GO, que no aporten informació sobre l'impacte de la PTR sobre les propietats de les isoformes. Així, malgrat les més de 60.000 publicacions existents sobre AS, poques de les isoformes existents s'han associat a propietats específiques, mentre que el nombre de noves variants AS/APA amb funcions desconegudes i fins i tot inexplorades augmenta de manera exponencial gràcies a la seqüenciació de nova generació (NGS). A causa de les limitacions tècniques del NGS per reconstruir l'estructura dels transcrits, la seqüenciació d'alt rendiment de transcrits de longitud completa mitjançant tecnologies de tercera generació (TGS) obre una nova era en la transcriptòmica, ja que millora la definició dels models genètics i, per primera vegada, permet associar amb precisió esdeveniments funcionals dins de la molècula d'ARN. Aquesta tesi aborda tres grans reptes per a progressar en l'estudi de la funció de les isoformes. En primer lloc, amb l'aparició i la popularitat creixent del TGS, la definició precisa i la caracterització completa dels transcriptomes de novo són essencials per garantir la qualitat de qualsevol conclusió sobre la diversitat del transcriptoma. La manca d'anàlisis de qualitat orientats a lectures llargues va motivar el desenvolupament de SQANTI (https://bitbucket.org/ ConesaLab / sqanti), una estratègia computacional automatitzada per a la caracterització estructural i l'avaluació de la qualitat dels transcriptomes de longitud completa. En segon lloc, els recursos funcionals existents centrats en el gen suposen una gran limitació per a l'estudi extensiu de la variabilitat funcional de les isoformes, especialment en les noves isoformes, que no es poden caracteritzar per bases de dades estàtiques. Per tant, vam dissenyar IsoAnnot, que construeix dinàmicament una base de dades amb anotacions funcionals a nivell d'isoforma, que utilitza com a informació d'entrada les seqüències dels transcrits i integra informació de diverses bases de dades i mètodes de predicció. Finalment, com no hi havia cap mètode per interrogar l'impacte funcional del PTR, vam desenvolupar nous enfocaments i eines fàcils d'utilitzar, com ara tappAS (http://tappas.org/), dissenyada per facilitar als investigadors els estudis funcionals de transcriptoma complet i de regulació d'isoformes en contexts específics. Per tant, aquesta tesi descriu el desenvolupament d'un marc d'anàlisi que aborda els reptes fonamentals de l'anàlisi funcional d'isoformes. Aplicada a un sistema de diferenciació neuronal murina, vam descobrir regions transmembrana específiques d'isoformes, la modulació de les quals per PTR podria contribuir a controlar la dinàmica mitocondrial específica del tipus cel·lular durant la determinació del destí neuronal.
[EN] One of the most exciting aspects of transcriptome biology is the contextual adaptability of eukaryotic transcriptomes and proteomes by post-transcriptional regulation (PTR). PTR mechanisms such as alternative splicing (AS) and alternative polyadenylation (APA) have emerged as tightly regulated processes playing a key role in generating transcriptome complexity and coordinating cell differentiation or tissue development. However, how these mechanisms imprint distinct functional characteristics on the resulting set of isoforms to define the observed phenotype remains poorly understood. The number of PTR variants and their resulting range of potentially functional consequences makes their functional validation an impractical task if done on a case-by-case basis. Besides, the lack of isoform-oriented functional profiling approaches has made that much of the computational work done to elucidate transcriptome-wide functional questions has either involved ad hoc computational pipelines applied to specific biological systems or has relied on simple GO-enrichment analysis that are not informative about the PTR impact on isoform properties. Thus, even though more than 60,000 publications on AS, a few number of existing isoforms have been associated with specific properties while the number of novel AS/APA variants with unknown and even unexplored functions is exponentially increasing thanks to the use of next-generation sequencing (NGS). Due to the technical limitations of NGS to reconstruct the transcript structure, high-throughput sequencing of full-length transcripts using third-generation technologies (TGS) is opening up a new transcriptomics era that enhances the definition of gene models and, for the first time, enables to precisely associate functional events within the RNA molecule. This thesis addresses three major challenges to the progression of the study of isoform function. First, with the emergence and increasing popularity of TGS, the accurate definition and comprehensive characterisation of de novo transcriptomes is essential to ensure the quality of any conclusions on transcriptome diversity drawn from these data. The lack of long-read oriented quality aware analysis motivated the development of SQANTI \url{(https://bitbucket.org/ConesaLab/sqanti)}, an automated pipeline for the structural characterization and quality assessment of full-length transcriptomes. Secondly, the gene-centric nature of functional resources remained the major limitation to the extended study of functional isoform variability, especially for novel isoforms, which cannot be characterised by static databases. Thus, we designed IsoAnnot, which dynamically constructs an isoform-resolved rich database of functional annotations by using as input transcript sequences and integrating information disseminated across several databases and prediction methods. Finally, because no methods to interrogate the functional impact of PTR were available, we developed novel approaches and user-friendly tools such as tappAS \url{(http://tappas.org/)}, designed to facilitate researchers the transcriptome-wide functional study of context-specific isoform regulation. Thereby, this thesis describes the development of an analysis framework that tackles the fundamental challenges of the isoform functional analysis by providing a set of novel methods and tools that offer an unique opportunity to explore how the phenotype is specified by altering the functional characteristics of expressed isoforms. Applied to a murine neural differentiation system, our pipeline profiled the effect of isoform regulation on the inclusion of several functional elements within transcripts between motor-neuron and oligodendrocyte differentiation systems and specifically, we discovered isoform-specific transmembrane regions whose modulation by PTR might contribute to control cell type-specific mitochondrial dynamics during neural fate determination.
This work was funded by the following grants: From 2014 to 2018. FPU: Training programme for Academic Staff. Spanish Ministry of Education, FPU2013/02348. From 2016 to 2019. NOVELSEQ: Novel methods for new challenges in the analysis of high-throughput sequencing data. MINECO, BIO2015-1658-R. From 2014 to 2017. DEANN: Developing a European American NGS Network. EU Marie Curie IRSES, GA-612583.
Fuente Lorente, LDL. (2019). Development of a bioinformatics approach for the functional analysis of alternative splicing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/124974
TESIS
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Vogel, Alexander Verfasser], Björn [Akademischer Betreuer] [Usadel, Ingo Akademischer Betreuer] Kurth y Ulrich [Akademischer Betreuer] [Schaffrath. "Long-read sequencing for de novo genome assembly in bioeconomic context / Alexander Vogel ; Björn Usadel, Ingo Kurth, Ulrich Schaffrath". Aachen : Universitätsbibliothek der RWTH Aachen, 2020. http://d-nb.info/123506946X/34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Zhang, Panpan. "Étude du paysage des éléments transposables sous forme d'ADN circulaire extrachromosomique et dans l'assemblage des génomes de plantes à l'aide du séquençage en lectures longues". Thesis, Université de Montpellier (2022-….), 2022. http://www.theses.fr/2022UMONG016.

Texto completo
Resumen
Les éléments transposables (TEs) sont des séquences d'ADN répétitives avec la capacité intrinsèque de se déplacer et de s’amplifier dans les génomes. La transposition active des TEs est liée à la formation d'ADN circulaire extrachromosomique (ADNecc). Cependant, le paysage complet de ce compartiment d’ADNecc ainsi que ces interactions avec le génome n’étaient pas bien définies. De plus, il n’existait au début de ma thèse aucun outil bioinformatique permettant d'identifier les ADNecc à partir de données de séquençage en lectures longues. Pour répondre à ces questions au cours de mon doctorat, nous avons tout d'abord développé un outil, appelé ecc_finder, pour automatiser la détection d'eccDNA à partir de séquences en lectures longues et optimisé la détection à partir de séquences de lecture courte pour caractériser la mobilité des TE. En appliquant ecc_finder aux données eccDNA-seq d'Arabidopsis, de l'homme et du blé (avec des tailles de génome allant de 120 Mb à 17 Gb), nous avons documenté l'applicabilité étendue d'ecc_finder ainsi que l’optimisation du temps de calcul, de la sensibilité et de la précision.Dans le deuxième projet, nous avons développé un outil de méta-assemblage appelé SASAR pour réconcilier les résultats de différents assemblages de génomes à partir de données de séquençage en lectures longues. Pour différentes espèces de plantes, SASAR a obtenu des assemblages de génome de haute qualité en un temps efficace et a résolu les variations structurales causées par les TE.Dans le dernier projet, nous avons utilisé le génome assemblé par SASAR et l'ADNecc détecté par ecc_finder pour caractériser les interactions entre les ADNecc et le génome. Dans les mutants épigénétiques hypométhylés d’Arabidopsis, nous avons mis en évidence le rôle de l'épigénome dans la protection de la stabilité du génome non seulement contre la mobilité des TE mais aussi envers les réarrangements génomiques et le chimérisme des gènes. Globalement, nos découvertes sur l'ADNecc, l'assemblage du génome et leurs interactions, ainsi que le développement d'outils, offrent de nouvelles perspectives pour comprendre le rôle des TE dans l'évolution adaptative des plantes à un changement rapide de l’environnement
Transposable elements (TEs) are repetitive DNA sequences with the intrinsic ability to move and amplify in genomes. Active transposition of TEs is linked to the formation of extrachromosomal circular DNA (eccDNA). However, the complete landscape of this eccDNA compartment and its interactions with the genome were not well defined. In addition, at the beginning of my thesis, there were no bioinformatics tools available to identify eccDNAs from long-read sequencing data.To address these questions during my PhD, we first developed a tool, called ecc_finder, to automate eccDNA detection from long-read sequencing and optimized detection from short-read sequences to characterize TE mobility. By applying ecc_finder to Arabidopsis, human and wheat eccDNA-seq data (with genome sizes ranging from 120 Mb to 17 Gb), we documented the broad applicability of ecc_finder as well as optimization of computational time, sensitivity and accuracy.In the second project, we developed a meta-assembly tool called SASAR to reconcile the results of different genome assemblies from long-read sequencing data. For different plant species, SASAR obtained high quality genome assemblies in an efficient time and resolved structural variations caused by TEs.In the last project, we used SASAR-assembled genome and ecc_finder-detected eccDNA to characterize eccDNA-genome interactions. In Arabidopsis hypomethylated epigenetic mutants, we highlighted the role of the epigenome in protecting genome stability not only from TE mobility but also from genomic rearrangements and gene chimerism. Overall, our findings on eccDNA, genome assembly and their interactions, as well as the development of tools, offer new insights into the role of TEs in the adaptive evolution of plants to rapid environmental change
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Jaudou, Sandra. "Metadetect : detection of Shiga toxin-producing Escherichia coli with novel metagenomics approaches and its application on dairy farms in France and Germany". Electronic Thesis or Diss., Maisons-Alfort, École nationale vétérinaire d'Alfort, 2023. http://www.theses.fr/2023ENVA0004.

Texto completo
Resumen
Les méthodologies actuelles de caractérisation d'Escherichia coli producteur de toxine Shiga (STEC) nécessitent l'isolement de la souche, ce qui est compliqué par le fait qu'il n'existe pas de milieu d'isolement spécifique qui distingue clairement les STEC des E. coli commensaux non pathogènes. Par conséquent, obtenir des informations sur les souches en utilisant une approche métagénomique éviterait d'isoler les souches pour les caractériser complètement. Dans le cadre du projet, en collaboration avec le BfR en Allemagne, nous évaluerons si de nouvelles approches de métagénomique à lecture longue pourraient déterminer sans ambiguïté si des marqueurs spécifiques d'EHEC typiques (E. coli entérohémorragique) sont co-localisés dans une même souche. Les approches de séquençage hybrides de deuxième et troisième génération seront évaluées. Des pipelines bioinformatiques seront évalués pour analyser les résultats de l'analyse métagénomique. Ces méthodes seront appliquées dans une étude pilote pour étudier le microbiote du lait cru provenant d'exploitations laitières françaises et allemandes et pour identifier un microbiome commun associé aux STEC pathogènes. Nous essaierons de définir un système basé sur l'établissement d'un « score moléculaire » pour qualifier l'état des exploitations
Current methodologies for characterization of Shiga toxin-producing Escherichia coli (STEC) require strain isolation, which is complicated by the fact that there is no specific isolation medium that clearly distinguishes STECs from non-pathogenic commensal E. coli. Therefore, obtaining strain information using a metagenomics approach would avoid isolating a strain to fully characterize it. In the framework of the project, in collaboration with the BfR in Germany, we will evaluate whether new, long-read metagenomics approaches could unambiguously determine whether specific markers of typical EHECs (Enterohemorrhagic E. coli) are co-located in the same strain. Third generation hybrid sequencing approaches will be evaluated. Appropriate bioinformatic pipelines developed in collaboration with the BfR will be evaluated to analyze the metagenomic analysis results. These methods will be applied in a pilot study to study the microbiota of raw milk from French and German dairy farms and to tentatively identify a common STEC-associated microbiome. We aim to define a ‘molecular score' based system to identify the status of the farms, in line with the objective to better precise the notion of ‘STEC molecular risk assessment approach' at the farm level
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Šalanda, Vojtěch. "Optimalizace zarovnání dat z next-generation sekvenování". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236077.

Texto completo
Resumen
This thesis presents short DNA alignment tools optimization. These short DNA reads are products of next\nobreakdash-generation sequencing technologies. The results produced by existing align\-ment tools can be influenced by various parameters. For this purpose, an optimization framework to find the optimal values of selected parameters was developed. This framework is based on differencial evolution algorithm and its main goal is to maximize the alignment accuracy. The functionality of the framework was tested on both real and generated data sets of short DNA reads. An accurate alignment is crucial for correct prediction of various genetic characteristics.
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Herzel, Lydia. "Co-transcriptional splicing in two yeasts". Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-179274.

Texto completo
Resumen
Cellular function and physiology are largely established through regulated gene expression. The first step in gene expression, transcription of the genomic DNA into RNA, is a process that is highly aligned at the levels of initiation, elongation and termination. In eukaryotes, protein-coding genes are exclusively transcribed by RNA polymerase II (Pol II). Upon transcription of the first 15-20 nucleotides (nt), the emerging nascent RNA 5’ end is modified with a 7-methylguanosyl cap. This is one of several RNA modifications and processing steps that take place during transcription, i.e. co-transcriptionally. For example, protein-coding sequences (exons) are often disrupted by non-coding sequences (introns) that are removed by RNA splicing. The two transesterification reactions required for RNA splicing are catalyzed through the action of a large macromolecular machine, the spliceosome. Several non-coding small nuclear RNAs (snRNAs) and proteins form functional spliceosomal subcomplexes, termed snRNPs. Sequentially with intron synthesis different snRNPs recognize sequence elements within introns, first the 5’ splice site (5‘ SS) at the intron start, then the branchpoint and at the end the 3’ splice site (3‘ SS). Multiple conformational changes and concerted assembly steps lead to formation of the active spliceosome, cleavage of the exon-intron junction, intron lariat formation and finally exon-exon ligation with cleavage of the 3’ intron-exon junction. Estimates on pre-mRNA splicing duration range from 15 sec to several minutes or, in terms of distance relative to the 3‘ SS, the earliest detected splicing events were 500 nt downstream of the 3‘ SS. However, the use of indirect assays, model genes and transcription induction/blocking leave the question of when pre-mRNA splicing of endogenous transcripts occurs unanswered. In recent years, global studies concluded that the majority of introns are removed during the course of transcription. In principal, co-transcriptional splicing reduces the need for post-transcriptional processing of the pre-mRNA. This could allow for quicker transcriptional responses to stimuli and optimal coordination between the different steps. In order to gain insight into how pre-mRNA splicing might be functionally linked to transcription, I wanted to determine when co-transcriptional splicing occurs, how transcripts with multiple introns are spliced and if and how the transcription termination process is influenced by pre-mRNA splicing. I chose two yeast species, S. cerevisiae and S. pombe, to study co-transcriptional splicing. Small genomes, short genes and introns, but very different number of intron-containing genes and multi-intron genes in S. pombe, made the combination of both model organisms a promising system to study by next-generation sequencing and to learn about co-transcriptional splicing in a broad context with applicability to other species. I used nascent RNA-Seq to characterize co-transcriptional splicing in S. pombe and developed two strategies to obtain single-molecule information on co-transcriptional splicing of endogenous genes: (1) with paired-end short read sequencing, I obtained the 3’ nascent transcript ends, which reflect the position of Pol II molecules during transcription, and the splicing status of the nascent RNAs. This is detected by sequencing the exon-intron or exon-exon junctions of the transcripts. Thus, this strategy links Pol II position with intron splicing of nascent RNA. The increase in the fraction of spliced transcripts with further distance from the intron end provides valuable information on when co-transcriptional splicing occurs. (2) with Pacific Biosciences sequencing (PacBio) of full-length nascent RNA, it is possible to determine the splicing pattern of transcripts with multiple introns, e.g. sequentially with transcription or also non-sequentially. Part of transcription termination is cleavage of the nascent transcript at the polyA site. The splicing status of cleaved and non-cleaved transcripts can provide insights into links between splicing and transcription termination and can be obtained from PacBio data. I found that co-transcriptional splicing in S. pombe is similarly prevalent to other species and that most introns are removed co-transcriptionally. Co-transcriptional splicing levels are dependent on intron position, adjacent exon length, and GC-content, but not splice site sequence. A high level of co-transcriptional splicing is correlated with high gene expression. In addition, I identified low abundance circular RNAs in intron-containing, as well as intronless genes, which could be side-products of RNA transcription and splicing. The analysis of co-transcriptional splicing patterns of 88 endogenous S. cerevisiae genes showed that the majority of intron splicing occurs within 100 nt downstream of the 3‘ SS. Saturation levels vary, and confirm results of a previous study. The onset of splicing is very close to the transcribing polymerase (within 27 nt) and implies that spliceosome assembly and conformational rearrangements must be completed immediately upon synthesis of the 3‘ SS. For S. pombe genes with multiple introns, most detected transcripts were completely spliced or completely unspliced. A smaller fraction showed partial splicing with the first intron being most often not spliced. Close to the polyA site, most transcripts were spliced, however uncleaved transcripts were often completely unspliced. This suggests a beneficial influence of pre-mRNA splicing for efficient transcript termination. Overall, sequencing of nascent RNA with the two strategies developed in this work offers significant potential for the analysis of co-transcriptional splicing, transcription termination and also RNA polymerase pausing by profiling nascent 3’ ends. I could define the position of pre-mRNA splicing during the process of transcription and provide evidence for fast and efficient co-transcriptional splicing in S. cerevisiae and S. pombe, which is associated with highly expressed genes in both organisms. Differences in S. pombe co-transcriptional splicing could be linked to gene architecture features, like intron position, GC-content and exon length.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Kuderna, Lukas 1989. "Application of genome assembly methods to human and non-human primate genomics". Doctoral thesis, Universitat Pompeu Fabra, 2020. http://hdl.handle.net/10803/668648.

Texto completo
Resumen
Genomic analyses are at the center of contemporary biology. These studies heavily rely on reference genome assemblies, yet those are typically highly fragmented. Having accurate representations of complex genomes, or parts thereof, is crucial to study human and primate evolution and disease. Here, we develop and apply new sequencing strategies and technologies to improve reference assemblies. We first explore the combinatorial potential of different datasets to generate a highly improved reference for the chimpanzee, a crucial species for the study of human origins. We are able to close 77% of the over 159.000 remaining gaps in the previous iteration of this species’ assembly and increase continuity by more than 750%. We then go on to develop a workflow to assemble the first human Y chromosome of African ancestry, using native flow-sorted chromosomes sequenced on a Nanopore device. We are able to assemble the Y chromosome to a reference grade quality and achieve unprecedented sequence resolution across structurally complex regions. These results open new avenues for comparative studies including the chimpanzee genome or human Y chromosomes.
Els anàlisis genòmics són el centre de la biologia contemporània. Aquests estudis depenen molt de l’assemblatge de genomes de referència, tot i que aquets en general estan molt fragmentats. Tenir representacions precises de genomes complexos, o parts d’aquests, és crucial per estudiar les malalties i l’evolució en humans i primats. En els estudis següents, desenvolupem i apliquem noves estratègies i tecnologies de seqüenciació per millorar els assemblatges de referència. En primer lloc, explorem el potencial de combinar diferents conjunts de dades per generar una referència substancialment millorada per al ximpanzé, una espècie crucial per a l'estudi dels orígens humans. Som capaços de tancar el 77% dels més de 159,000 buits que hi havia a la iteració prèvia de l’assemblatge d'aquesta espècie, i augmentar la continuïtat en més del 750%. A continuació, desenvolupem un protocol per assemblar el primer cromosoma Y humà d’ascendència africana, utilitzant cromosomes nadius aïllats per citometria de flux i seqüenciats mitjançant un dispositiu Nanopore. D’aquesta manera, aconseguim assemblar el cromosoma Y a una qualitat de referència i una resolució de seqüències sense precedents en regions estructuralment complexes. Aquests resultats obren noves vies per a estudis comparatius que inclouen el genoma del ximpanzé o els cromosomes Y humans.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

FORMENTI, GIULIO PAOLO. "THIRD-GENERATION SEQUENCING AND ASSEMBLY OF THE BARN SWALLOW GENOME AND A STUDY ON THE EVOLUTION OF THE HUNTINGTIN GENE". Doctoral thesis, Università degli Studi di Milano, 2019. http://hdl.handle.net/2434/611650.

Texto completo
Resumen
The present thesis is divided in two sections. The first section outlines the scientific work that I have accomplished during the last year of my graduate studies. The goal was to generate a reference genome for the European barn swallow (Hirundo rustica rustica). The barn swallow (Hirundo rustica) is a migratory bird that has been the focus of a large number of ecological, behavioural and genetic studies. To facilitate further population genetics and genomic studies, I have generated a high-quality genome for the European subspecies (Hirundo rustica rustica) using third-generation Single Molecule Real-Time (SMRT) DNA sequencing from Pacific Biosciences (Menlo Park, California, USA) and optical mapping from Bionano Genomics (San Diego, California, USA). For optical mapping, DNA molecules were labelled both with one of the original Nick, Label, Repair and Stain (NLRS) nickases (enzyme Nb.BssSI) and with the new Direct Label and Stain (DLS) approach (enzyme DLE-1). This allowed to compare and integrate optical maps derived both from NLRS and DLS technologies. The latter was officially released in February 2018 and avoids nicking and subsequent cleavage of DNA molecules upon staining. To my knowledge, this has been the first genome assembly to incorporate DLS data and this approach has more than doubled the assembly N50 with respect to the nickase system. Furthermore, the dual enzyme hybrid scaffold led to a marginal increase in scaffold N50 and an overall increase of confidence in scaffolds. After removal of haplotigs, the final assembly is approximately 1.21 Gbp in size, with a N50 value of over 25.95 Mbp. The high genome contiguity achieved represents an improvement over 650-fold with respect to a previously reported assembly based on paired-end short read data, and it is well in excess of those specified for “Platinum genomes” by the Vertebrate Genomes Project. It can therefore constitute a valuable resource for studies concerning the evolution of avian genomes in general as well as for population genetics and genomics in the barn swallow, with the potential for boosting research on the barn swallow biology and ecology at unprecedented speed. This scientific endeavour culminated in a publication that I authored entitled “SMRT long-read sequencing and Direct Label and Stain optical maps allow the generation of a high-quality genome assembly for the European barn swallow (Hirundo rustica rustica)” published in the peer-reviewed journal Gigascience (IF 7.5, 2016). The second section of this thesis presents the methodological work and the conclusions drawn from my - and other collaborators - work on the study of the evolutionary origins of Huntington’s Disease, a genetic neurodegenerative disorder. The study was conducted in the Laboratory of Stem Cell Biology and Pharmacology of Neurodegenerative Diseases directed by Prof. Elena Cattaneo at the University of Milan where I worked for the first two years of my PhD (and also during my Master Thesis work) and whose research effort is on the phylogenetic and biological investigation of HD causative gene. The goal that I wished to achieve with this study, as part of an on-going effort in the host laboratory aimed at tracing Huntington’s Disease-causing gene throughout evolution, was to reconstruct and understand the evolutionary origins of the CAG repeat embedded into the exon 1 of the Htt gene. This goal could be achieved by collecting DNA sequences from orthologous genes in order to allow a comparative analysis of the differences and similarities between the human sequence and that of other animal species. More specifically, existing sequences could be retrieved from public databases and/or assessed directly by sequencing from biological samples. These samples could be made available from already in place or newly established collaborations. Htt exon 1 sequences could then be aligned to each other in a multiple alignment, resulting in a detailed picture of Htt exon 1 CAG repeats along the tree of life. The multiple alignment, when subjected to a bioinformatics analysis of the selective pressures, could be used to elucidate the evolutionary features of this simple repeat. The study was made possible also thanks to a collaboration between Prof. Cattaneo and my Ph.D. thesis supervisor Prof. Nicola Saino. At the time of writing, a manuscript is in preparation reporting part of the data from this work together with other data obtained in the Cattaneo’s laboratory.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Fruchard, Cécile. "Étude des chromosomes sexuels et du déterminisme du sexe chez les plantes : comparaison des systèmes Silene et Coccinia". Thesis, Lyon, 2018. http://www.theses.fr/2018LYSE1108/document.

Texto completo
Resumen
Bien que les sexes séparés (dioecie) soient plus rares que chez les animaux, ∼15 600 espèces dioiques ont évolué chez les angiospermes (∼6% de l'ensemble des espèces). La manière dont le sexe de ces plantes est contrôlé est une question centrale de la biologie végétale, mais également de l'agronomie car de nombreuses plantes cultivées sont des plantes dioiques (∼20% des espèces cultivées) mais dont un seul sexe (généralement les femelles) présente un intérêt agronomique. Pourtant, seulement trois gènes du déterminisme du sexe ont été identifiés à ce jour chez les plantes dioiques, chez le kaki, l'asperge et la fraise. La dioecie a vraisemblablement évolué plusieurs fois chez les angiospermes et il est possible que les gènes du déterminisme du sexe soient divers. Deux voies principales d'évolution vers la dioecie ont été identifiées. Les deux partent d'une espèce dont les fleurs sont hermaphrodites, le régime de reproduction ancestral chez les angiospermes, puis passent soit par un intermédiaire monoique (espèce avec des fleurs unisexuées mâles et femelles sur le même individu), soit par un intermédiaire gynodioique (espèce avec des femelles et des individus avec des fleurs hermaphrodites). Cette thèse a pour objet la comparaison de deux systèmes de plantes représentant ces deux voies. Chez Coccinia grandis, une cucurbitacée ayant également des chromosomes XY, l'évolution de la dioecie est passée par la monoecie. Chez Silene latifolia, une plante dioique bien étudiée avec des chromosomes sexuels XY, l'évolution de la dioecie s'est faite à partir de la gynodioecie. Trois gènes contrôlant la monoecie ont été identifiés chez le melon et il a été proposé que ces gènes soient les gènes du déterminisme dans les espèces dioiques proches du melon comme C. grandis. Nous avons donc opté pour une approche gène candidat dans cette espèce. Très peu de ressources génétiques et génomiques sont disponibles chez C. grandis, et nous avons choisi d'utiliser SEXDETector, une méthode probabiliste qui utilise des données RNA-seq pour génotyper des parents et leurs descendants, et qui infère les gènes lies au sexe sans génome de référence. Cette méthode m'a permis d'identifier 1 364 gènes présents sur les chromosomes sexuels de C. grandis. J'ai établi que les gènes differentiellement exprimés entre les sexes étaient plus abondants sur chromosomes sexuels que sur les autosomes. J'ai également observé des marques de la dégénérescence du chromosome Y chez cette plante, comme des diminutions d'expression ou des pertes de gènes. Enfin, mes résultats démontrent la présence de compensation de dosage chez C. grandis. Le test des gènes candidats est en cours. Chez S. latifolia, 3 grandes régions liées au déterminisme ont déjà été identifiées sur le chromosome Y. Pour identifier les gènes du déterminisme, nous avons choisi de séquencer ce chromosome. Le séquençage des chromosomes Y est encore un défi pour la génomique. La phase d'assemblage est très difficile à cause des répétitions présentes en grand nombre sur ces chromosomes. En conséquence, les séquences complètes de chromosome Y sont très rares, et principalement disponibles chez les animaux. Afin de minimiser les problèmes d'assemblage dus aux répétitions, nous avons utilisé des techniques dites de 3eme génération (avec de grandes lectures). J'ai moi-même généré des données MinION (Oxford Nanopore) à partir d'ADN de chromosome Y. L'assemblage a été réalisé en combinant des données Illumina, PacBio et MinION. Notre assemblage final fait une taille de 563 Mb pour un N50 de 6 114 pb, et contient 16 219 gènes annotés de novo
Although rarer than in animals, separate sexes (dioecy) have evolved in ∼15,600 angiosperm species (∼6% of all angiosperm species). How sex is controlled is a central question in plant sciences and also in agronomy as many crops are dioecious (∼20% of crops) with only one useful sex (usually female). Only three master sex-determining genes have been identified in dioecious plants so far, namely in persimmons, asparagus and strawberry. Dioecy likely evolved several times independently in angiosperms, suggesting that sex-determining genes are of diverse origins. Hermaphroditism is the predicted ancestral state of the angiosperm flower. Two main pathways have been identified that explain the evolution of hermaphroditism towards dioecy: either through a monoecious state (with both unisexual male and female flowers on the same individual) or a gynodioecious state (with females and individuals having hermaphroditic flowers). My aim is to compare two plant systems representing each one of these two pathways. In Coccinia grandis, a Cucurbitaceae with an XY chromosome system, dioecy evolved through monoecy. In Silene latifolia, a well-studied dioecious plant with XY sex chromosomes, dioecy evolved through gynodioecy. Three genes controlling monoecy have been identified in melon, and it was suggested that these genes act as sex-determining genes in closely related dioecious species such as C. grandis. I therefore chose a candidate gene approach in this species. Very few genetic and genomic data are available in C. grandis, and we chose to use SEX-DETector, a probabilistic method that uses RNA-seq data to genotype parents and their offspring, and infers sex-linked genes with no need for a reference genome. This method allowed me to identify 1,364 genes that are present on the sex chromosomes of C. grandis. I found that the sex chromosomes are enriched in sex-biasedgenes when compared to autosomes and I characterized Y chromosome degeneration in terms of decreased expression and gene loss. Finally, I showed that dosage compensation occurs in C. grandis. Testing for the three candidates genes is ongoing. In S. latifolia 3 regions involved in sex determination have already been identified on the Y chromosome. We chose to sequence this chromosome to identify sex-determining genes. The sequencing of Y chromosomes remains one of the greatest challenges of current genomics. The assembly step is very difficult because of their highly repeated content. Consequently, fully sequenced Y chromosomes are rare and mainly available for research in animals. To overcome the difficulty of assembling reads with many repeats, I used third generation sequencing (TGS, producing long reads). I produced a dataset using the Oxford Nanopore MinION sequencer with Y chromosome DNA. Assembling was performed using a combination of Illumina, MinION and PacBio sequencing data. The final assembly had a total length of 563 Mb with a scaffold N50 of 6,114 bp, and contained 16,219 de novo annotated genes
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Lehmann, Nathalie. "Development of bioinformatics tools for single-cell transcriptomics applied to the search for signatures of symmetric versus asymmetric division mode in neural progenitors". Electronic Thesis or Diss., Université Paris sciences et lettres, 2021. http://www.theses.fr/2021UPSLE070.

Texto completo
Resumen
Ces dernières années, l’émergence des approches en cellules uniques (scRNA-seq) a favorisé la caractérisation de l’hétérogénéité cellulaire avec une précision inégalée. Malgré leur démocratisation, l’analyse de ces données reste complexe, en particulier pour les organismes dont les annotations sont incomplètes. Au cours ma thèse, j’ai observé que les annotations génomiques du poulet sont lacunaires, ce qui engendre la perte d’un grand nombre de lectures de séquençage. J’ai évalué à quel point une annotation améliorée affecte les résultats biologiques et les conclusions issues de ces analyses. Nous proposons une nouvelle approche basée sur la ré-annotation du génome à partir de données scRNA-seq et de RNA-seq bulk en lectures longues. Ce projet de biologie computationnelle s’appuie sur une étroite collaboration avec l’équipe expérimentale de Xavier Morin (IBENS). Le principal objectif biologique est la recherche de signatures de mode de division symétrique et asymétrique au sein de progéniteurs neuronaux. Afin d’identifier les principaux changements transcriptionnels, j’ai mis en place des approches dédiées à la recherche de signatures géniques à partir de données scRNA-seq
In recent years, single-cell RNA-seq (scRNA-seq) has fostered the characterization of cell heterogeneity at a remarkable high resolution. Despite their democratization, the analysis of scRNA-seq remains a challenge, particularly for organisms whose genomic annotations are partial. During my PhD, I observed that the chick genomic annotations are often incomplete, thus resulting in a loss of a large number of sequencing reads. I investigated how an enriched annotation affects the biological results and conclusions from these analyses. We developed a novel approach based on the re-annotation of the genome with scRNA-seq data and long reads bulk RNA-seq. This computational biology project capitalises on a tight collaboration with the experimental team of Xavier Morin (IBENS). The main biological focus is the search for signatures of symmetric versus asymmetric division mode in neural progenitors. In order to identify the key transcriptional switches that occur during the neurogenic transition, I have implemented bioanalysis approaches dedicated to the search for gene signatures from scRNA-seq data
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Hsieh, Yi-Te y 謝憶得. "Long Read Error Correction by Short-Read Alignment Using FM-Index". Thesis, 2015. http://ndltd.ncl.edu.tw/handle/k3c65a.

Texto completo
Resumen
碩士
國立中正大學
資訊工程研究所
103
The third generation sequencing can generate multi-kilobase sequences and has the potential to improve genome assembly. Nevertheless, it has higher error rates in comparison with second generation sequencing. The error rates have limited its use to improve assembly. In this thesis, we introduce a hybrid correction algorithm to correct third generation sequencing reads by finding overlapping short reads with high-quality. We improved the efficiency of a previous method which align short reads onto long PacBio reads using FM-index. The results indicate that the accuracy of corrected PacBio reads achieve over 93%, the memory consumption is lower, and the running time is faster than previous method.
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Tsai, Cheng-Wei y 蔡承洧. "Scaffolding Pre-Assembled Contigs Using Long-Read Sequencing". Thesis, 2013. http://ndltd.ncl.edu.tw/handle/97232567550873046854.

Texto completo
Resumen
碩士
國立中正大學
資訊工程研究所
101
In recent years, third-generation sequencing platform has been applied for improving genome assembly, which is able to sequence a single DNA molecular in real time and generate reads with longer length. But unfortunately, these long reads are often with higher error rates compared with previous sequencing technologies, in which most errors are indels. The high error rates greatly reduce the usability of long reads for improving genome assembly. In this thesis, we design and implement a program for scaffolding pre-assembled contigs using long reads (called SACLR) generated by Pacific Biosciences platform. Given a set of pre-assembled contigs and long reads, SACLR determines the mapped boundary of contigs using a novel clustering alignment approach for tolerating various errors of the platform. The linkage between contigs across multiple long reads is established and integrated for further improving the scaffolding length. It is worth mentioning that the gaps within our scaffolds can be directly filled and the two ends of each scaffold may be further extended by long reads. SACLR has been tested using a variety of real data sets. The experimental results showed that SCALR produced more contiguous and accurate sequences.
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Bi, Chongwei. "Long Read Based Individual Molecule Sequencing and Real-time Pathogen Detection". Diss., 2021. http://hdl.handle.net/10754/672109.

Texto completo
Resumen
With the ability to produce reads with hundreds of kilobases in length, long-read sequencing technology is emerging as a powerful tool to decode complex genetic sequences that are previously inaccessible for short reads. Though the sequencing chemistry and base calling algorithm are being actively developed, the accuracy of the current long-read sequencing is still considerably low, thus limiting its applications. In this dissertation, I present three long read based DNA sequencing methods to overcome the limitation of read accuracy, contribute to a better understanding of Cas9 editing outcomes and mitochondrial DNA heterogeneity, and pave the way for real-time pathogen detection and mutation surveillance. The development of IDMseq enables the single-base-resolution haplotype-resolved quantitative characterization of diverse types of rare variants. IDMseq provides the first quantitative evidence of persistent nonrandom large structural variants following repair of double-strand breaks induced by CRISPR-Cas9 in human ESCs. The development of iMiGseq represents the first mitochondrial DNA sequencing method that provides ultra-sensitive variant detection, complete haplotyping, and unbiased evaluation of heteroplasmy levels, all at the individual mitochondrial DNA molecule level. iMiGseq uncovers unappreciated levels of heteroplasmic variants in single healthy human oocytes well below the current 1% detection limit, of which numerous variants are deleterious and associated with late-onset mitochondrial disease and cancer. It could comprehensively characterize and haplotype single-nucleotide and structural variants of mitochondrial DNA and their genetic linkage in NARP/Leigh syndrome patient-derived cells. The development of NIRVANA deals with the COVID-19 pandemic. NIRVANA can simultaneously detect SARS-CoV-2 and three co-infecting respiratory viruses, and monitor mutations for up to 96 samples in real time. It provides a promising solution for rapid field-deployable detection and mutation surveillance of pandemic viruses. Taken all together, IDMseq, iMiGseq and NIRVANA utilize the advantage of long reads, overcome the limitation of low accuracy, and facilitate the application of long-read sequencing technologies in multidisciplinary fields.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

TANG, YU-YU y 湯玉宇. "Hybrid error correction for long-read sequencing using adaptive seeding strategies". Thesis, 2019. http://ndltd.ncl.edu.tw/handle/qztabn.

Texto completo
Resumen
碩士
國立中正大學
資訊工程研究所
107
Next-generation sequencing (NGS) and Third-generation sequencing (TGS) technologies are the popular choices in de novo assembly projects. NGS can achieve highest sequencing accuracy, the assembly genomes are often highly fragmented due to repeats larger than the short-read lengths. On the other head, long reads generated by TGS are able to span across larger repeats and thus assemble a complete genome. To date, the acceptance of TGS is still limited by the high error rate and cost. Previously, we combined the advantage of both NGS and TGS by developing a hybrid correction strategy, called PBHC. PBHC correct error-prone long reads with highly-accuracy short reads using both alignment-based and alignment-free methods. However, the assembly contiguity of PBHC drops significantly in the large genome data sets. This thesis investigated the root cause of bad assembly regions in the large genome and found the reads are largely uncorrected within repetitive regions. Further investigation revealed the seeds within repeat regions mostly are error-prone. We invented an adaptive seeing strategy to improve the accuracy of seed. Any given long read is partitioned into repeat and unique regions and applied with different seeding strategies to identify the seed. The experimental results indicated the new seeding algorithm improved the genome contiguity under lower sequencing coverage.
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Bachmann, J. A., Andrew Tedder, B. Laenen, K. A. Steige y T. Slotte. "Targeted long-read sequencing of a locus under long-term balancing selection in Capsella". 2018. http://hdl.handle.net/10454/17277.

Texto completo
Resumen
Yes
Rapid advances in short-read DNA sequencing technologies have revolutionized population genomic studies, but there are genomic regions where this technology reaches its limits. Limitations mostly arise due to the difficulties in assembly or alignment to genomic regions of high sequence divergence and high repeat content, which are typical characteristics for loci under strong long-term balancing selection. Studying genetic diversity at such loci therefore remains challenging. Here, we investigate the feasibility and error rates associated with targeted long-read sequencing of a locus under balancing selection. For this purpose, we generated bacterial artificial chromosomes (BACs) containing the Brassicaceae S-locus, a region under strong negative frequency-dependent selection which has previously proven difficult to assemble in its entirety using short reads. We sequence S-locus BACs with single-molecule long-read sequencing technology and conduct de novo assembly of these S-locus haplotypes. By comparing repeated assemblies resulting from independent long-read sequencing runs on the same BAC clone we do not detect any structural errors, suggesting that reliable assemblies are generated, but we estimate an indel error rate of 5.7×10−5. A similar error rate was estimated based on comparison of Illumina short-read sequences and BAC assemblies. Our results show that, until de novo assembly of multiple individuals using long-read sequencing becomes feasible, targeted long-read sequencing of loci under balancing selection is a viable option with low error rates for single nucleotide polymorphisms or structural variation. We further find that short-read sequencing is a valuable complement, allowing correction of the relatively high rate of indel errors that result from this approach.
This study was supported by a grant from the Swedish Research Council to T.S.
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Natarajan, Santhi. "Accelerated and Accurate Alignment of Short Reads in High Throughput Next Generation Sequencing [NGS] Platforms". Thesis, 2016. http://etd.iisc.ac.in/handle/2005/4073.

Texto completo
Resumen
The genome of an organism encompasses the unique set of genetic instructions for every individual in a species. The genome, in totality, guides the course of evolution, development, genetic and epigenetic growth factors of an individual. Genomics, the study of genome, presents an interdisciplinary landscape, with a multistage data analytics pipeline. Understanding the genome involves determining the order of the four constituent nucleotides or bases or genetic alphabets, namely adenine (A), cytosine (C), guanine (G) and thymine (T), within the genome’s DNA sequence, and the process is widely known as sequencing. Next Generation Sequencing (NGS) involves massively parallel sequencing of genetic data with high throughput. NGS offers an unparalleled interrogation of the genome, throwing deeper insight into the functional and structural investigation of genetic data. The deductions from such a study leave a huge impact across fields, including medical diagnostics, therapeutics and drug discovery, and as well form the basis for genomic medicine. Data processing with NGS happens over an elaborated multi-stage data analytics pipeline. During the primary data analysis, the sequencing process produces billions of short fragments, called short reads, of the target genome. This amounts to petabytes of unprocessed genomic raw data. Short read mapping (SRM) is the process of mapping these short reads to their respective positions in the target genome. Due to the sheer volume of data that needs to be handled, SRM serves as a major sequential bottleneck to the NGS data analytics pipeline in genomics, and presents profound technical and computing challenges. Classified as a complex big data engineering problem, SRM thus calls for innovative computational, scientific and statistical approaches towards big data analysis. A strict validation of various algorithms and softwares in an NGS pipeline is essential, to ensure reliable and accurate results. With growing volume of NGS big data, the SRM and subsequent analytic steps de-mand a High Performance Computing (HPC) environment for data storage and analyses. Existing solutions for accelerating SRM provide notable performance, while leveraging heuristics and incurring significant error rates. Given the impact of the results of SRM in subsequent diagnostics and therapeutics, such heuristics and error rates are not affordable. In this context, we need precise, affordable, reliable and actionable results from SRM, to support any application, with uncompromised accuracy and performance.In this work, we present a massively parallel and scalable archetype, for accurate alignment of short reads, at a fine-grained single nucleotide resolution. The significant contributions of this work are presented below: 1. We present a robust and efficient indexing scheme for the reference genome, which is devoid of heuristics. The scheme reports all possible regions of mapping for a short read, inclusive of repeat regions. The lookup scheme efficiently handles the redundancy in reads. Though this leaves the rest of the pipeline with more data for SRM as compared to the heuristic solutions, it provides the end user with reliable and actionable results. 2. We present an efficient parallel implementation of an accurate sequence alignment algorithm based on the Dynamic Programming (DP) method. Our alignment kernels can seamlessly perform the traceback process in hardware simultaneously with the forward scan, thus eliminating the computational and memory bottlenecks associated with such algorithms. These kernels thus report alignment in a minimum deterministic time, which forms the first level of acceleration for SRM. 3. We present AccuRA, a hardware accelerator targeting reconfigurable hardware platforms. The model scales well at multiple levels of granularity, which precisely aligns short reads, at a fine-grained single nucleotide resolution, and offers full coverage of the genome. 4. We present GMAccS, a GPGPU based solution, for the SRM accelerator. This employs a platform independent model, capable of targeting a heterogeneous set of GPU hardware. 5. We present a performance and scalability analysis model for both the archetypes. The results from the prototypes substantiate the scalability of these architectures at multiple levels of granularity. 6. We present the generalization of our solution across applications which exhibit similar data patterns in terms of volume, variety, rate of production and analysis, randomness and uncertainty involved in data, and use Approximate String Matching (ASM) as the fundamental operation for data analytics. 7. We present the various problems within the biological domain, in terms of complexity and quantity of data, to which our solution can be customized and scaled, at various levels of granularity. We have presented the results from various prototype models of both AccuRA and GMAccS. The AccuRA prototype, hosting eight kernel units on a single reconfigurable device, aligns short reads with an alignment performance of 20.48 Giga Cell Updates Per Second (GCUPs). AccuRA can be ported onto devices as diverse as SoCs, ASICs or reconfigurable platform based hardware coprocessors or accelerators. The scalability analysis proved to substantiate the parallel AccuRA architecture, making it a promising target to accelerate the SRM process in the NGS pipeline. The in-house supercomputing platform SahasraT, which is a Cray XC40 system, hosted the prototype for the GMAccS archetype. The GMAccS prototypes align with an optimal performance of 23.69 Million Maps Per Second (MMPS) to 528.69 MMPS, while scaling from a single GPU to 24 GPUs. The performance model for GMAccS, as well as the results from the prototypes, substantiates the scalability of the GMAccS archetype and the subsequent performance enhancement achieved by it. Both AccuRA and GMAccS accommodate the big data of genomics, with uncompromised accuracy, precision and performance, while aligning the smaller archeal, bacterial and fungal genomes, to the much larger mammalian human genomes. These models have successfully handled redundant reads and multiread alignments. The results from AccuRA and GMAccS are available in the Sequence Alignment/Map (SAM) format, making it compatible with the downstream applications in the NGS pipeline. With a basic parameterized SRM model, and the results from its various prototypes for small and large genome benchmarks, we have gained the confidence that this solution can serve the requirements of accurate and scalable alignment of NGS big data. We believe that our model can serve as a reliable candidate in the future of genomics, called the "genomic highway", where data belonging to multiple applications can be streamed in, and can be aligned real time, with minimal memory and storage requirements, on a generalized alignment engine.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Bachmann, J. A., Andrew Tedder, B. Laenen, M. Fracassetti, A. Désamoré, C. Lafon-Placette, K. A. Steige et al. "Genetic basis and timing of a major mating system shift in Capsella". 2019. http://hdl.handle.net/10454/17270.

Texto completo
Resumen
Yes
A crucial step in the transition from outcrossing to self-fertilization is the loss of genetic self-incompatibility (SI). In the Brassicaceae, SI involves the interaction of female and male speci-ficity components, encoded by the genesSRKandSCRat the self-incompatibility locus (S-lo-cus). Theory predicts thatS-linked mutations, and especially dominant mutations inSCR, arelikely to contribute to loss of SI. However, few studies have investigated the contribution ofdominant mutations to loss of SI in wild plant species. Here, we investigate the genetic basis of loss of SI in the self-fertilizing crucifer speciesCapsella orientalis, by combining genetic mapping, long-read sequencing of completeS-hap-lotypes, gene expression analyses and controlled crosses. We show that loss of SI inC. orientalisoccurred<2.6 Mya and maps as a dominant trait totheS-locus. We identify a fixed frameshift deletion in the male specificity geneSCRand con-firm loss of male SI specificity. We further identify anS-linked small RNA that is predicted tocause dominance of self-compatibility. Our results agree with predictions on the contribution of dominantS-linked mutations toloss of SI, and thus provide new insights into the molecular basis of mating system transitions.
Work at Uppsala Genome Center is funded by 550 RFI / VR and Science for Life Laboratory, Sweden. The SNP&SEQ Platform is supported by 551 the Swedish Research Council and the Knut and Alice Wallenberg Foundation. V.C. 552 acknowledges support by a grant from the European Research Council (NOVEL project, 553 grant #648321). The authors thank the French Ministère de l’Enseignement Supérieur et de la 554 Recherche, the Hauts de France Region and the European Funds for Regional Economical 555 Development for their financial support to this project. This work was supported by a grant 556 from the Swedish Research Council (grant #D0432001) and by a grant from the Science for 557 Life Laboratory, Swedish Biodiversity Program to T.S. The Swedish Biodiversity Program is 558 supported by the Knut and Alice Wallenberg Foundation.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía