Добірка наукової літератури з теми "Genomics bioinformatics variant discovery sequence analysis"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Genomics bioinformatics variant discovery sequence analysis".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Genomics bioinformatics variant discovery sequence analysis"

1

Ahmed, Zeeshan, Eduard Gibert Renart, and Saman Zeeshan. "Genomics pipelines to investigate susceptibility in whole genome and exome sequenced data for variant discovery, annotation, prediction and genotyping." PeerJ 9 (July 26, 2021): e11724. http://dx.doi.org/10.7717/peerj.11724.

Повний текст джерела
Анотація:
Over the last few decades, genomics is leading toward audacious future, and has been changing our views about conducting biomedical research, studying diseases, and understanding diversity in our society across the human species. The whole genome and exome sequencing (WGS/WES) are two of the most popular next-generation sequencing (NGS) methodologies that are currently being used to detect genetic variations of clinical significance. Investigating WGS/WES data for the variant discovery and genotyping is based on the nexus of different data analytic applications. Although several bioinformatics applications have been developed, and many of those are freely available and published. Timely finding and interpreting genetic variants are still challenging tasks among diagnostic laboratories and clinicians. In this study, we are interested in understanding, evaluating, and reporting the current state of solutions available to process the NGS data of variable lengths and types for the identification of variants, alleles, and haplotypes. Residing within the scope, we consulted high quality peer reviewed literature published in last 10 years. We were focused on the standalone and networked bioinformatics applications proposed to efficiently process WGS and WES data, and support downstream analysis for gene-variant discovery, annotation, prediction, and interpretation. We have discussed our findings in this manuscript, which include but not are limited to the set of operations, workflow, data handling, involved tools, technologies and algorithms and limitations of the assessed applications.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Wiggans, G. R., D. J. Null, J. B. Cole, and H. D. Norman. "256 GENOMIC EVALUATION OF FERTILITY TRAITS AND DISCOVERY OF HAPLOTYPES THAT AFFECT FERTILITY OF US DAIRY CATTLE." Reproduction, Fertility and Development 28, no. 2 (2016): 260. http://dx.doi.org/10.1071/rdv28n2ab256.

Повний текст джерела
Анотація:
Genomic evaluations of dairy cattle became official in the United States in January 2009 for Holsteins and Jerseys, and later for Brown Swiss, Ayrshires, and Guernseys. Up to 33 yield, fitness, calving, and conformation traits are evaluated, and the fertility traits included daughter pregnancy rate and heifer and cow conception rates. Additional fertility traits, such as age at first calving and days from calving to first insemination, also are being studied. Male fertility (sire conception rate) is evaluated phenotypically rather than through genomics. Over 1 million animals have genotypes in the national database, which reflects collaboration with Canada and Europe. Most of the genotypes are from females and are from genotyping chips with <30 000 single nucleotide polymorphisms (SNP). To combine data across chips, genotypes are imputed to a set of >77 000 SNP. The imputation process involves dividing the chromosome into segments of approximately equal length and determining the paternal or maternal origin of the alleles. Because some segments were never homozygous, they were assumed to contain an abnormality that resulted in early embryonic death. If a decrease in sire conception rate could be associated with a bull that was a carrier of such a chromosomal segment, the haplotype was designated as affecting fertility. Once the region was identified, bioinformatic analysis was used to discover the causative variant for many of those haplotypes. Accuracy of genomic evaluations is determined by size of the reference population and heritability of the trait. The reference population for Holsteins includes >180 000 bulls and cows. Because fertility traits have low heritabilities, genomic information is particularly useful in improving evaluation accuracy. Accuracy of fertility evaluations is expected to increase further by discovering causative variants for various aspects of conception and gestation through investigation of sequence data.
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Smith, Frances, David Brawand, Laura Steedman, Matthew Oakley, Christopher Wall, Peter Rushton, Margaret Allchurch, et al. "A Comprehensive Next Generation Sequencing Gene Panel Focused on Unexplained Anemia." Blood 126, no. 23 (December 3, 2015): 946. http://dx.doi.org/10.1182/blood.v126.23.946.946.

Повний текст джерела
Анотація:
Abstract Congenital anemia is difficult to diagnose once common causes have been excluded; for example 80% cases of congenital non-spherocytic hemolytic anemia are undiagnosed once pyruvate kinase and G6PD deficiencies have been excluded using phenotypic analysis. We describe a next generation sequencing strategy, targeting 147 genes, to facilitate the diagnosis of these conditions. The coding regions, splice sites and 200 bp into the untranslated regions were examined in each gene. All clinically significant variants were confirmed by Sanger sequencing, including confirmation in any appropriate family members. Illumina MiSeq data was analysed using a bespoke bioinformatics pipeline, which has been validated to a UK certified standard. The pipeline implements detection of genetic variants using multiple base callers and discovery of copy number variants based on sequencing depth. Variants are annotated with information from ClinVar, and population frequency data from ExAC and 1000 genomes project. All genes are sequenced in every individual but data analysis can easily be restricted to virtual subpanels, excluding analysis of genes not requested. Here we present three cases, highlighting the diagnostic utility of the panel as well as the underlying bioinformatics analysis. Case 1. A male Caucasian child of <1 year, presented with haemolysis (LDH 539 IU/L, total bilirubin 39 umol/L), haematology (Hb 92g/L, MCV 84.4, MCH 28.9, absolute retic count 313.8x109/L); his film showed marked anisopoikilocytes, microspherocytes and polychromasia. He had frontal bossing and a palpable spleen and had suffered several infections, the child was transfused once. His father's film showed elliptocytes, FBC (Hb 127g/L, MCV 89.6, MCH 30.6, absolute retic count 230.5x109/L) but he had never been transfused. The mother's FBC was normal (Hb 113g/L, MCV 87.0, MCH 29.2, absolute retic count 48.4x109/L) but her film also showed elliptocytes. Analysis using the red cell panel found the child to be compound heterozygous for c.83G>A; p.Arg28His and c.[5572C>G; 6531-12C>T]; p.[Leu1858Val;?] in the SPTA1 gene, suggesting the diagnosis of hereditary pyropoikilocytosis. The c.83G>A; p.Arg28His mutation was inherited from the father and the c.[5572C>G; 6531-12C>T]; p.[Leu1858Val;?] low expression allele was inherited from the mother, who was homozygous. Case 2. The post mortem report from a hydropic still birth (36/40) showed extensive extramedullary hematopoiesis and severe anemia. A DNA sample was sent to the laboratory accompanied by blood samples from both parents whose hematology was normal. The DNA sample from the proband was relatively small so only the parental samples were analyzed using the red cell panel. Sequence analysis identified the mother to carry the c.3173dupG; p.Gln1659fs pathogenic variant and the father carried the c.2867_2868+1dupCCG pathogenic variant in the CDAN1 gene. Sanger Sequencing showed that the child had inherited both mutations from the parents. Variants in CDAN1 are associated with CDA type 1 which is documented to be a rare form of anemia which can be lethal. Case 3. An Italian girl carrying a paternally inherited c.118C>T β0 thalassemia variant presented with a severe form of microcytic anemia (FBC, Hb 86g/L, RBC 4.87 x1012/L, MCV 55.2, MCH 17.7 and HbA2=5%). The severity of her anemia (not transfused) and palpable spleen suggested she had an additional pathogenic variant that had not been identified. Her mother had normal hematology FBC, Hb 133g/L, RBC 4.82x1012/L, MCV 79.9, MCH 27.6. After sequencing, Exome Depth analysis of the proband's LCR identified a novel deletion which removed the 5' HS1 and HS2 sites but left HS3-5 intact (confirmed by MLPA in the mother and proband). The combination of this mild down regulation of the beta globin locus in combination with the c.118C>T β0 thalassemia variant caused her phenotype to be more severe than just a beta thalassemia carrier. Identifying pathogenic variants in these families is important as it facilitates prognosis and treatment, and allows prenatal diagnosis to be offered in future. To date the panel has assessed 10 cases of anemia with unknown cause and has made a definitive diagnosis in 8 (80%). Of the two undiagnosed, one was a child that died at 3 weeks and received multiple intrauterine and neonatal transfusions and had severe anemia and the other was a suspected case of CDA with little associated phenotype. Disclosures No relevant conflicts of interest to declare.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Bao, Riyue, Lei Huang, Jorge Andrade, Wei Tan, Warren A. Kibbe, Hongmei Jiang, and Gang Feng. "Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing." Cancer Informatics 13s2 (January 2014): CIN.S13779. http://dx.doi.org/10.4137/cin.s13779.

Повний текст джерела
Анотація:
The advent of next-generation sequencing technologies has greatly promoted advances in the study of human diseases at the genomic, transcriptomic, and epigenetic levels. Exome sequencing, where the coding region of the genome is captured and sequenced at a deep level, has proven to be a cost-effective method to detect disease-causing variants and discover gene targets. In this review, we outline the general framework of whole exome sequence data analysis. We focus on established bioinformatics tools and applications that support five analytical steps: raw data quality assessment, preprocessing, alignment, post-processing, and variant analysis (detection, annotation, and prioritization). We evaluate the performance of open-source alignment programs and variant calling tools using simulated and benchmark datasets, and highlight the challenges posed by the lack of concordance among variant detection tools. Based on these results, we recommend adopting multiple tools and resources to reduce false positives and increase the sensitivity of variant calling. In addition, we briefly discuss the current status and solutions for big data management, analysis, and summarization in the field of bioinformatics.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Yang, Junmeng, Anna Liu, Isabella He, and Yongsheng Bai. "Bioinformatics Analysis Revealed Novel 3′UTR Variants Associated with Intellectual Disability." Genes 11, no. 9 (August 26, 2020): 998. http://dx.doi.org/10.3390/genes11090998.

Повний текст джерела
Анотація:
MicroRNAs (or miRNAs) are short nucleotide sequences (~17–22 bp long) that play important roles in gene regulation through targeting genes in the 3′untranslated regions (UTRs). Variants located in genomic regions might have different biological consequences in changing gene expression. Exonic variants (e.g., coding variant and 3′UTR variant) are often causative of diseases due to their influence on gene product. Variants harbored in the 3′UTR region where miRNAs perform their targeting function could potentially alter the binding relationships for target pairs, which could relate to disease causation. We gathered miRNA–mRNA targeting pairs from published studies and then employed the database of microRNA Target Site single nucleotide variants (SNVs) (dbMTS) to discover novel SNVs within the selected pairs. We identified a total of 183 SNVs for the 114 pairs of accurate miRNA–mRNA targeting pairs selected. Detailed bioinformatics analysis of the three genes with identified variants that were exclusively located in the 3′UTR section indicated their association with intellectual disability (ID). Our result showed an exceptionally high expression of GPR88 in brain tissues based on GTEx gene expression data, while WNT7A expression data were relatively high in brain tissues when compared to other tissues. Motif analysis for the 3′UTR region of WNT7A showed that five identified variants were well-conserved across three species (human, mouse, and rat); the motif that contains the variant identified in GPR88 is significant at the level of the 3′UTR of the human genome. Studies of pathways, protein–protein interactions, and relations to diseases further suggest potential association with intellectual disability of our discovered SNVs. Our results demonstrated that 3′UTR variants could change target interactions of miRNA–mRNA pairs in the context of their association with ID. We plan to automate the methods through developing a bioinformatics pipeline for identifying novel 3′UTR SNVs harbored by miRNA-targeted genes in the future.
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Tremblay, Olivier, Zachary Thow, and A. Rod Merrill. "Several New Putative Bacterial ADP-Ribosyltransferase Toxins Are Revealed from In Silico Data Mining, Including the Novel Toxin Vorin, Encoded by the Fire Blight Pathogen Erwinia amylovora." Toxins 12, no. 12 (December 11, 2020): 792. http://dx.doi.org/10.3390/toxins12120792.

Повний текст джерела
Анотація:
Mono-ADP-ribosyltransferase (mART) toxins are secreted by several pathogenic bacteria that disrupt vital host cell processes in deadly diseases like cholera and whooping cough. In the last two decades, the discovery of mART toxins has helped uncover the mechanisms of disease employed by pathogens impacting agriculture, aquaculture, and human health. Due to the current abundance of mARTs in bacterial genomes, and an unprecedented availability of genomic sequence data, mART toxins are amenable to discovery using an in silico strategy involving a series of sequence pattern filters and structural predictions. In this work, a bioinformatics approach was used to discover six bacterial mART sequences, one of which was a functional mART toxin encoded by the plant pathogen, Erwinia amylovora, called Vorin. Using a yeast growth-deficiency assay, we show that wild-type Vorin inhibited yeast cell growth, while catalytic variants reversed the growth-defective phenotype. Quantitative mass spectrometry analysis revealed that Vorin may cause eukaryotic host cell death by suppressing the initiation of autophagic processes. The genomic neighbourhood of Vorin indicated that it is a Type-VI-secreted effector, and co-expression experiments showed that Vorin is neutralized by binding of a cognate immunity protein, VorinI. We demonstrate that Vorin may also act as an antibacterial effector, since bacterial expression of Vorin was not achieved in the absence of VorinI. Vorin is the newest member of the mART family; further characterization of the Vorin/VorinI complex may help refine inhibitor design for mART toxins from other deadly pathogens.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Alsamman, Alsamman M., Shafik D. Ibrahim, and Aladdin Hamwieh. "KASPspoon: an in vitro and in silico PCR analysis tool for high-throughput SNP genotyping." Bioinformatics 35, no. 17 (January 8, 2019): 3187–90. http://dx.doi.org/10.1093/bioinformatics/btz004.

Повний текст джерела
Анотація:
Abstract Motivation Fine mapping becomes a routine trial following quantitative trait loci (QTL) mapping studies to shrink the size of genomic segments underlying causal variants. The availability of whole genome sequences can facilitate the development of high marker density and predict gene content in genomic segments of interest. Correlations between genetic and physical positions of these loci require handling of different experimental genetic data types, and ultimately converting them into positioning markers using a routine and efficient tool. Results To convert classical QTL markers into KASP assay primers, KASPspoon simulates a PCR by running an approximate-match searching analysis on user-entered primer pairs against the provided sequences, and then comparing in vitro and in silico PCR results. KASPspoon reports amplimers close to or adjoining genes/SNPs/simple sequence repeats and those that are shared between in vitro and in silico PCR results to select the most appropriate amplimers for gene discovery. KASPspoon compares physical and genetic maps, and reports the primer set genome coverage for PCR-walking. KASPspoon could be used to design KASP assay primers to convert QTL acquired by classical molecular markers into high-throughput genotyping assays and to provide major SNP resource for the dissection of genotypic and phenotypic variation. In addition to human-readable output files, KASPspoon creates Circos configurations that illustrate different in silico and in vitro results. Availability and implementation Code available under GNU GPL at (http://www.ageri.sci.eg/index.php/facilities-services/ageri-softwares/kaspspoon). Supplementary information Supplementary data are available at Bioinformatics online.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

BLAXTER, M., M. ASLETT, D. GUILIANO, J. DAUB, and THE FILARIAL GENOME PROJECT. "Parasitic helminth genomics." Parasitology 118, no. 7 (October 1999): 39–51. http://dx.doi.org/10.1017/s0031182099004060.

Повний текст джерела
Анотація:
The initiation of genome projects on helminths of medical importance promises to yield new drug targets and vaccine candidates in unprecedented numbers. In order to exploit this emerging data it is essential that the user community is aware of the scope and quality of data available, and that the genome projects provide analyses of the raw data to highlight potential genes of interest. Core bioinformatics support for the parasite genome projects has promoted these approaches. In the Brugia genome project, a combination of expressed sequence tag sequencing from multiple cDNA libraries representing the complete filarial nematode lifecycle, and comparative analysis of the sequence dataset, particularly using the complete genome sequence of the model nematode C. elegans, has proved very effective in gene discovery.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Karabayev, Daniyar, Askhat Molkenov, Kaiyrgali Yerulanuly, Ilyas Kabimoldayev, Asset Daniyarov, Aigul Sharip, Ainur Seisenova, Zhaxybay Zhumadilov, and Ulykbek Kairov. "re-Searcher: GUI-based bioinformatics tool for simplified genomics data mining of VCF files." PeerJ 9 (May 3, 2021): e11333. http://dx.doi.org/10.7717/peerj.11333.

Повний текст джерела
Анотація:
Background High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Different standard data types and file formats have been developed to store and analyze sequence and genomics data. Variant Call Format (VCF) is the most widespread genomics file type and standard format containing genomic information and variants of sequenced samples. Results Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. re-Searcher solves this problem by pre-processing VCF files by chunks to not load RAM of computer. The tool can be used as standalone user-friendly multiplatform GUI application as well as web application (https://nla-lbsb.nu.edu.kz). The software including source code as well as tested VCF files and additional information are publicly available on the GitHub repository (https://github.com/LabBandSB/re-Searcher).
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Knight, Samantha JL, Ruth Clifford, Pauline Robbe, Sara DC Ramos, Adam Burns, Adele T. Timbs, Reem Alsolami, et al. "The Identification of Further Minimal Regions of Overlap in Chronic Lymphocytic Leukemia Using High-Resolution SNP Arrays." Blood 124, no. 21 (December 6, 2014): 3315. http://dx.doi.org/10.1182/blood.v124.21.3315.3315.

Повний текст джерела
Анотація:
Abstract Background:Historically, the identification of minimal deleted regions (MDRs) has been a useful approach for pinpointing genes involved in the pathogenesis of human malignancies and constitutional disorders. Microarray technology has offered increased capability for newly identifying or refining existing MDRs and minimal overlapping regions (MORs) in cancer. Despite this, in chronic lymphocytic leukemia (CLL), published MORs that pinpoint only a few candidate genes have been limited and with the advent of NGS, the utility of high resolution array work as a discovery tool has become uncertain. Here, we show that profiling copy number abnormalities (CNAs) and cnLOH using arrays in a large patient series can still be a valuable approach for the identification of genes that are disrupted or mutated in CLL and have a role in CLL development and/or progression. Methods: 250 CLL patient DNAs from individuals enrolled in two UK-based Phase II randomised controlled trials (AdMIRe and ARCTIC trials) were tested using Infinium HumanOmni2.5-8 v1.1 according to manufacturer’s guidelines (Illumina Inc, San Diego, CA). Data were processed using GenomeStudioV2009.2 (Illumina Inc.) and analysed using Nexus Discovery Edition v6.1 (BioDiscovery, Hawthorne, CA). All Nexus plots were inspected visually to verify calls made, identify uncalled events and exclude likely false positives. To exclude common germline CNVs, the Database of Genomic Variants (DGV), a comprehensive catalog of structural variation in control data, was used. Copy number (CN) changes that encompassed fully changes noted in the DGV were excluded from further analysis. Regions of copy neutral loss of heterozygosity (cnLOH) were recorded if >1Mb in size, but were not used to define or refine MORs. Data from 1275 age-appropriate control samples minimised the reporting of common cnLOH events. All genomic coordinates were noted with reference to the GRCh37, hg19 assembly. MORs were investigated using Microsoft Excel filtering functions. A subset of genes (n=91) selected from MORs mainly on the basis of event frequency and/or number of genes within the MOR and/or literature interest were taken forward for targeted sequencing (exons only) of appropriate samples with/without CN Losses or cnLOH (Set 1 n=124; Set 2 n=126). These were tested using custom designed TruSeq Custom Amplicon panels (Illumina Inc) and processed according to manufacturer’s instructions. SAMHD1 was excluded from these panels since it had been studied separately within our laboratory. The data were analysed using an in-house bioinformatics pipeline that uses the sequence aligners MSR and Stampy and the variant callers GATK and Platypus, followed by stringent filtering. Results: Using our datasets we have identified >50 MORs previously unreported in the literature. Six of these showed copy number (CN) losses in >3% of patients studied. Furthermore, we have refined 14 MORs that overlapped with regions described previously and that had also a CN loss frequency of >3%. Thirteen MORs involved only a single reference gene, often a gene implicated previously in cancer (eg. SAMHD1, MTSS1, DCC and RFC1). Of the 91 genes taken forward for targeted sequencing, stringent data filtering led to a subset of 19 genes of interest harbouring exonic mutations. Genes with mutations identified include DCC, BAP1 and FBXW7, also implicated previously in cancer. Conclusion: We have generated high resolution CNA and cnLOH profiles for 250 first-line chemo-immunotherapy treated CLL patients and used this information to document newly identified MORs, to refine MORs reported previously and to identify mutation harbouring genes using targeted NGS. Functional knowledge supports our hypothesis that these genes may have a contributory role in CLL. For two genes, SAMHD1 and FBXW7, relevance in CLL has been established already. Taken together, our data validate the utility of high resolution arrays studies for the identification of candidate genes that may be involved in CLL development or progression when disrupted. Further studies are required to confirm a role for these genes in CLL and to elucidate the nature of the underlying biological mechanisms. Disclosures No relevant conflicts of interest to declare.
Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Genomics bioinformatics variant discovery sequence analysis"

1

Bolognini, Davide. "Unraveling tandem repeat variation in personal genomes with long reads." Doctoral thesis, Università di Siena, 2021. http://hdl.handle.net/11365/1141832.

Повний текст джерела
Анотація:
Tandem repeats are repeated sequences that occur adjacent to each other in the human genome. Due to their prevalence and their association with a number of genetic diseases, there is a rising interest in developing tools for tandem repeat profiling. Genome-wide discovery approaches are needed to fully understand their roles in health and disease but resolving tandem repeat variation accurately remains a very challenging task. Indeed, while traditional mapping-based and assembly-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies provide the long reads required to broaden the scope of detectable tandem repeats but exhibit substantially higher sequencing error rates that complicates repeat resolution. In order to overcome limitations of prior methods, we developed TRiCoLOR, a freely-available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in long-read sequencing data de novo and resolve their motif and multiplicity in a haplotype-specific manner. The tool further includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. Tested on synthetic data harboring tandem repeat contractions and expansions, TRiCoLOR demonstrates excellent performances and improved precision and recall compared to alternative tools. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes. Compared to assembly-based approaches for structural variant detection, TRiCoLOR demonstrates capable to resolve tandem repeats in difficult to assemble regions that are prone to mis-assemblies or incorrect repeat assignments. TRiCoLOR is open-source and implemented in python 3, with supporting C++ code and bash scripts. The tool is released through GitHub https://github.com/davidebolo1993/TRiCoLOR and as a docker image https://hub.docker.com/r/davidebolo1993/tricolor, with accompanying documentation.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Highnam, Gareth Wei An. "Optimizing analysis pipelines for improved variant discovery." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/47451.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Verzotto, Davide. "Advanced Computational Methods for Massive Biological Sequence Analysis." Doctoral thesis, Università degli studi di Padova, 2011. http://hdl.handle.net/11577/3426282.

Повний текст джерела
Анотація:
With the advent of modern sequencing technologies massive amounts of biological data, from protein sequences to entire genomes, are becoming increasingly available. This poses the need for the automatic analysis and classification of such a huge collection of data, in order to enhance knowledge in the Life Sciences. Although many research efforts have been made to mathematically model this information, for example finding patterns and similarities among protein or genome sequences, these approaches often lack structures that address specific biological issues. In this thesis, we present novel computational methods for three fundamental problems in molecular biology: the detection of remote evolutionary relationships among protein sequences, the identification of subtle biological signals in related genome or protein functional sites, and the phylogeny reconstruction by means of whole-genome comparisons. The main contribution is given by a systematic analysis of patterns that may affect these tasks, leading to the design of practical and efficient new pattern discovery tools. We thus introduce two advanced paradigms of pattern discovery and filtering based on the insight that functional and conserved biological motifs, or patterns, should lie in different sites of sequences. This enables to carry out space-conscious approaches that avoid a multiple counting of the same patterns. The first paradigm considered, namely irredundant common motifs, concerns the discovery of common patterns, for two sequences, that have occurrences not covered by other patterns, whose coverage is defined by means of specificity and extension. The second paradigm, namely underlying motifs, concerns the filtering of patterns, from a given set, that have occurrences not overlapping other patterns with higher priority, where priority is defined by lexicographic properties of patterns on the boundary between pattern matching and statistical analysis. We develop three practical methods directly based on these advanced paradigms. Experimental results indicate that we are able to identify subtle similarities among biological sequences, using the same type of information only once. In particular, we employ the irredundant common motifs and the statistics based on these patterns to solve the remote protein homology detection problem. Results show that our approach, called Irredundant Class, outperforms the state-of-the-art methods in a challenging benchmark for protein analysis. Afterwards, we establish how to compare and filter a large number of complex motifs (e.g., degenerate motifs) obtained from modern motif discovery tools, in order to identify subtle signals in different biological contexts. In this case we employ the notion of underlying motifs. Tests on large protein families indicate that we drastically reduce the number of motifs that scientists should manually inspect, further highlighting the actual functional motifs. Finally, we combine the two proposed paradigms to allow the comparison of whole genomes, and thus the construction of a novel and practical distance function. With our method, called Unic Subword Approach, we relate to each other the regions of two genome sequences by selecting conserved motifs during evolution. Experimental results show that our approach achieves better performance than other state-of-the-art methods in the whole-genome phylogeny reconstruction of viruses, prokaryotes, and unicellular eukaryotes, further identifying the major clades of these organisms.
Con l'avvento delle moderne tecnologie di sequenziamento, massive quantità di dati biologici, da sequenze proteiche fino a interi genomi, sono disponibili per la ricerca. Questo progresso richiede l'analisi e la classificazione automatica di tali collezioni di dati, al fine di migliorare la conoscenza nel campo delle Scienze della Vita. Nonostante finora siano stati proposti molti approcci per modellare matematicamente le sequenze biologiche, ad esempio cercando pattern e similarità tra sequenze genomiche o proteiche, questi metodi spesso mancano di strutture in grado di indirizzare specifiche questioni biologiche. In questa tesi, presentiamo nuovi metodi computazionali per tre problemi fondamentali della biologia molecolare: la scoperta di relazioni evolutive remote tra sequenze proteiche, l'individuazione di segnali biologici complessi in siti funzionali tra loro correlati, e la ricostruzione della filogenesi di un insieme di organismi, attraverso la comparazione di interi genomi. Il principale contributo è dato dall'analisi sistematica dei pattern che possono interessare questi problemi, portando alla progettazione di nuovi strumenti computazionali efficaci ed efficienti. Vengono introdotti così due paradigmi avanzati per la scoperta e il filtraggio di pattern, basati sull'osservazione che i motivi biologici funzionali, o pattern, sono localizzati in differenti regioni delle sequenze in esame. Questa osservazione consente di realizzare approcci parsimoniosi in grado di evitare un conteggio multiplo degli stessi pattern. Il primo paradigma considerato, ovvero irredundant common motifs, riguarda la scoperta di pattern comuni a coppie di sequenze che hanno occorrenze non coperte da altri pattern, la cui copertura è definita da una maggiore specificità e/o possibile estensione dei pattern. Il secondo paradigma, ovvero underlying motifs, riguarda il filtraggio di pattern che hanno occorrenze non sovrapposte a quelle di altri pattern con maggiore priorità, dove la priorità è definita da proprietà lessicografiche dei pattern al confine tra pattern matching e analisi statistica. Sono stati sviluppati tre metodi computazionali basati su questi paradigmi avanzati. I risultati sperimentali indicano che i nostri metodi sono in grado di identificare le principali similitudini tra sequenze biologiche, utilizzando l'informazione presente in maniera non ridondante. In particolare, impiegando gli irredundant common motifs e le statistiche basate su questi pattern risolviamo il problema della rilevazione di omologie remote tra proteine. I risultati evidenziano che il nostro approccio, chiamato Irredundant Class, ottiene ottime prestazioni su un benchmark impegnativo, e migliora i metodi allo stato dell'arte. Inoltre, per individuare segnali biologici complessi utilizziamo la nozione di underlying motifs, definendo così alcune modalità per il confronto e il filtraggio di motivi degenerati ottenuti tramite moderni strumenti di pattern discovery. Esperimenti su grandi famiglie proteiche dimostrano che il nostro metodo riduce drasticamente il numero di motivi che gli scienziati dovrebbero altrimenti ispezionare manualmente, mettendo in luce inoltre i motivi funzionali identificati in letteratura. Infine, combinando i due paradigmi proposti presentiamo una nuova e pratica funzione di distanza tra interi genomi. Con il nostro metodo, chiamato Unic Subword Approach, relazioniamo tra loro le diverse regioni di due sequenze genomiche, selezionando i motivi conservati durante l'evoluzione. I risultati sperimentali evidenziano che il nostro approccio offre migliori prestazioni rispetto ad altri metodi allo stato dell'arte nella ricostruzione della filogenesi di organismi quali virus, procarioti ed eucarioti unicellulari, identificando inoltre le sottoclassi principali di queste specie.
Стилі APA, Harvard, Vancouver, ISO та ін.

Книги з теми "Genomics bioinformatics variant discovery sequence analysis"

1

Inc, ebrary, ed. Advances in genomic sequence analysis and pattern discovery. Hackensack, N.J: World Scientific, 2011.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Genomics bioinformatics variant discovery sequence analysis"

1

Flint, Jonathan. "Molecular genetics." In New Oxford Textbook of Psychiatry, 222–33. Oxford University Press, 2012. http://dx.doi.org/10.1093/med/9780199696758.003.0029.

Повний текст джерела
Анотація:
The transformation of the LOD score (an acronym for log of the odds ratio), from obscurity as a footnote in medical genetics, to celebrity as multiple choice test item in professional examinations in psychiatry, epitomizes the invasion of genetics, and particularly molecular genetics into psychiatric research. Moreover, like other celebrities caught up in fast moving fields, LOD scores are likely to return to their humble origins within a few years. As molecular genetic approaches to mental health move away from simply identifying genes and DNA sequence variants towards functional studies of increasing complexity, newcomers to the field have to master an expanding literature that covers diverse fields: from quantitative genetics to cell biology, from LOD scores to epigenetics. This chapter takes on the task of making the reader sufficiently familiar with the broad range of subjects now required to follow the progress of psychiatric genetics in the primary literature. A number of achievements have to be highlighted. Foremost among these is the completion of the human genome project. Announced annually from 2001 and thereby begging the question as to what constitutes completion, the human genome project is now an essential biological resource. As expected, the ability to sequence whole genomes has transformed the way genetics is carried out, perhaps most egregiously with the rise of bioinformatics as a core discipline: discovery now takes place using the internet rather than the laboratory. Anyone with an interest in human biology should look at the frequently updated information at http://www.ensembl.org or http://genome.ucsc.edu. Without the human genome two other critical developments would have been impossible: the ability to analyse the expression of every gene in the genome and the ability to analyse (theoretically at least) every sequence variant. Both developments also depend on miniaturization technologies that enable the manufacture and interrogation of initially thousands and then millions of segments of DNA. In addition, results from the International Haplotype Map (HapMap) project, which catalogues common variation in the human genome have been crucial in making it possible to take apart the genetic basis of common, complex disorders such as depression, schizophrenia, and anxiety. Few disciplines are more burdened with jargon than molecular genetics. This is partly due to the proliferation of molecular techniques, but it is also partly intrinsic to the subject; the only unifying principle is evolution, which often operates in a very ad hoc fashion. Biological solutions to the problems posed by selection result in the adaptation of existing structures to new uses, rather than to the invention of purpose-built systems. Consequently there are few general lessons to be learnt and the novice simply has to become adept at recognizing the acronyms and neologisms that decorate the literature. The material in this chapter aims to equip the reader with the necessary terminology. It begins with the structure and function of DNA, an essential starting place for a number of reasons.
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії