Log in

Relevant bibliographies by topics / Legal dataset

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'Legal dataset'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Legal dataset.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Legal dataset"

1

KUNČIČ, ALJAŽ. "Institutional quality dataset." Journal of Institutional Economics 10, no. 1 (July 1, 2013): 135–61. http://dx.doi.org/10.1017/s1744137413000192.

Full text

Abstract:

AbstractIn this paper, we emphasize the role of institutions as the underlying basis for economic and social activity. We describe and compare different institutional classification systems, which is rarely done in the literature, and show how to empirically operationalize institutional concepts. More than 30 established institutional indicators can be clustered into three homogeneous groups of formal institutions: legal, political and economic, which capture to a large extent the complete formal institutional environment of a country. We compute the latent quality of legal, political and economic institutions for every country in the world and for every year. On this basis, we propose a legal, political and economic World Institutional Quality Ranking, through which we can follow whether a country is improving or worsening its relative institutional environment. The calculated latent institutional quality measures can be especially useful in further panel data applications and add to the usual practice of using simply one or another index of institutional quality to capture the institutional environment. We make the Institutional Quality Dataset, covering up to 197 countries and territories from 1990 to 2010, freely available online.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhong, Haoxi, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. "JEC-QA: A Legal-Domain Question Answering Dataset." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 9701–8. http://dx.doi.org/10.1609/aaai.v34i05.6519.

Full text

Abstract:

We present JEC-QA, the largest question answering dataset in the legal domain, collected from the National Judicial Examination of China. The examination is a comprehensive evaluation of professional skills for legal practitioners. College students are required to pass the examination to be certified as a lawyer or a judge. The dataset is challenging for existing question answering methods, because both retrieving relevant materials and answering questions require the ability of logic reasoning. Due to the high demand of multiple reasoning abilities to answer legal questions, the state-of-the-art models can only achieve about 28% accuracy on JEC-QA, while skilled humans and unskilled humans can reach 81% and 64% accuracy respectively, which indicates a huge gap between humans and machines on this task. We will release JEC-QA and our baselines to help improve the reasoning ability of machine comprehension models. You can access the dataset from http://jecqa.thunlp.org/.

APA, Harvard, Vancouver, ISO, and other styles

3

Ratnayaka, Gathika, Nisansa de Silva, Amal Shehan Perera, Gayan Kavirathne, Thirasara Ariyarathna, and Anjana Wijesinghe. "Context Sensitive Verb Similarity Dataset for Legal Information Extraction." Data 7, no. 7 (June 28, 2022): 87. http://dx.doi.org/10.3390/data7070087.

Full text

Abstract:

Existing literature demonstrates that verbs are pivotal in legal information extraction tasks due to their semantic and argumentative properties. However, granting computers the ability to interpret the meaning of a verb and its semantic properties in relation to a given context can be considered as a challenging task, mainly due to the polysemic and domain specific behaviours of verbs. Therefore, developing mechanisms to identify behaviors of verbs and evaluate how artificial models detect the domain specific and polysemic behaviours of verbs can be considered as tasks with significant importance. In this regard, a comprehensive dataset that can be used as an evaluation resource, as well as a training data set, can be considered as a major requirement. In this paper, we introduce LeCoVe, which is a verb similarity dataset intended towards facilitating the process of identifying verbs with similar meanings in a legal domain specific context. Using the dataset, we evaluated both domain specific and domain generic embedding models, which were developed using state-of-the-art word representation and language modelling techniques. As a part of the experiments carried out using the announced dataset, Sense2Vec and BERT models were trained using a corpus of legal opinion texts in order to capture domain specific behaviours. In addition to LeCoVe, we demonstrate that a neural network model, which was developed by combining semantic, syntactic, and contextual features that can be obtained from the outputs of embedding models, can perform comparatively well, even in a low resource scenario.

APA, Harvard, Vancouver, ISO, and other styles

4

Lin, Chun-Hsien, and Pu-Jen Cheng. "LARQS: An Analogical Reasoning Evaluation Dataset for Legal Word Embedding." International Journal on Natural Language Computing 11, no. 3 (June 30, 2022): 1–16. http://dx.doi.org/10.5121/ijnlc.2022.11301.

Full text

Abstract:

Applying natural language processing-related algorithms is currently a popular project in legal applications, for instance, document classification of legal documents, contract review and machine translation. Using the above machine learning algorithms, all need to encode the words in the document in the form of vectors. The word embedding model is a modern distributed word representation approach and the most common unsupervised word encoding method. It facilitates subjecting other algorithms and subsequently performing the downstream tasks of natural language processing vis-à-vis. The most common and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation. This paper proposes establishing a 1,256 Legal Analogical Reasoning Questions Set (LARQS) from the 2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be ubiquitous in the word embedding model.

APA, Harvard, Vancouver, ISO, and other styles

5

Shaheen, Z., D. I. Mouromtsev, and I. Postny. "RuLegalNER: a new dataset for Russian legal named entities recognition." Scientific and Technical Journal of Information Technologies, Mechanics and Optics 23, no. 4 (August 1, 2023): 854–57. http://dx.doi.org/10.17586/2226-1494-2023-23-4-854-857.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Owsiak, Andrew P., Allison K. Cuttner, and Brent Buck. "The International Border Agreements Dataset." Conflict Management and Peace Science 35, no. 5 (July 8, 2016): 559–76. http://dx.doi.org/10.1177/0738894216646978.

Full text

Abstract:

We introduce a dataset that focuses on the delimitation of interstate borders under international law—the International Border Agreements Dataset (IBAD). This dataset contains information on the agents involved in (e.g. states, third-parties, and colonial powers), methods used during (e.g. negotiation, mediation, arbitration, adjudication, administrative decrees, post-war conferences, and plebiscites), and outcomes of (e.g. full and intermediate agreements) the border settlement process during the period 1816–2001. Our focus on international legal agreements and the process that produces them makes the IBAD valuable for those that study not only territorial conflict, but also international conflict, cooperation, law, and conflict management.

APA, Harvard, Vancouver, ISO, and other styles

7

Crossfield, Samantha S. R., Kieran Zucker, Paul Baxter, Penny Wright, Jon Fistein, Alex F. Markham, Mark Birkin, Adam W. Glaser, and Geoff Hall. "A data flow process for confidential data and its application in a health research project." PLOS ONE 17, no. 1 (January 21, 2022): e0262609. http://dx.doi.org/10.1371/journal.pone.0262609.

Full text

Abstract:

Background The use of linked healthcare data in research has the potential to make major contributions to knowledge generation and service improvement. However, using healthcare data for secondary purposes raises legal and ethical concerns relating to confidentiality, privacy and data protection rights. Using a linkage and anonymisation approach that processes data lawfully and in line with ethical best practice to create an anonymous (non-personal) dataset can address these concerns, yet there is no set approach for defining all of the steps involved in such data flow end-to-end. We aimed to define such an approach with clear steps for dataset creation, and to describe its utilisation in a case study linking healthcare data. Methods We developed a data flow protocol that generates pseudonymous datasets that can be reversibly linked, or irreversibly linked to form an anonymous research dataset. It was designed and implemented by the Comprehensive Patient Records (CPR) study in Leeds, UK. Results We defined a clear approach that received ethico-legal approval for use in creating an anonymous research dataset. Our approach used individual-level linkage through a mechanism that is not computer-intensive and was rendered irreversible to both data providers and processors. We successfully applied it in the CPR study to hospital and general practice and community electronic health record data from two providers, along with patient reported outcomes, for 365,193 patients. The resultant anonymous research dataset is available via DATA-CAN, the Health Data Research Hub for Cancer in the UK. Conclusions Through ethical, legal and academic review, we believe that we contribute a defined approach that represents a framework that exceeds current minimum standards for effective pseudonymisation and anonymisation. This paper describes our methods and provides supporting information to facilitate the use of this approach in research.

APA, Harvard, Vancouver, ISO, and other styles

8

Baviskar, Dipali, Swati Ahirrao, and Ketan Kotecha. "Multi-Layout Invoice Document Dataset (MIDD): A Dataset for Named Entity Recognition." Data 6, no. 7 (July 20, 2021): 78. http://dx.doi.org/10.3390/data6070078.

Full text

Abstract:

The day-to-day working of an organization produces a massive volume of unstructured data in the form of invoices, legal contracts, mortgage processing forms, and many more. Organizations can utilize the insights concealed in such unstructured documents for their operational benefit. However, analyzing and extracting insights from such numerous and complex unstructured documents is a tedious task. Hence, the research in this area is encouraging the development of novel frameworks and tools that can automate the key information extraction from unstructured documents. However, the availability of standard, best-quality, and annotated unstructured document datasets is a serious challenge for accomplishing the goal of extracting key information from unstructured documents. This work expedites the researcher’s task by providing a high-quality, highly diverse, multi-layout, and annotated invoice documents dataset for extracting key information from unstructured documents. Researchers can use the proposed dataset for layout-independent unstructured invoice document processing and to develop an artificial intelligence (AI)-based tool to identify and extract named entities in the invoice documents. Our dataset includes 630 invoice document PDFs with four different layouts collected from diverse suppliers. As far as we know, our invoice dataset is the only openly available dataset comprising high-quality, highly diverse, multi-layout, and annotated invoice documents.

APA, Harvard, Vancouver, ISO, and other styles

9

Hidayat, Fahrul, and Rakyan Paksi Nagara. "DATASET BATAS WILAYAH ADMINISTRASI UNTUK PENATAAN RUANG WILAYAH." Seminar Nasional Geomatika 3 (February 15, 2019): 441. http://dx.doi.org/10.24895/sng.2018.3-0.984.

Full text

Abstract:

Era desentralisasi politik Indonesia sudah berjalan selama 20 tahun namun permasalahan batas wilayah masih menjadi beban bagi pemerintah baik di tingkat pusat maupun daerah. Data Kementerian Dalam Negeri pada Januari 2018 menunjukkan bahwa batas wilayah administrasi daerah yang sudah memiliki dasar hukum adalah 48,47% atau 475 segmen. Persentase jumlah segmen yang masih dalam proses penegasan dan belum ditegaskan berturut - turut adalah 34,59% dan 16,94%. Batas wilayah seharusnya sudah jelas dan legal sebelum digunakan untuk proses administrasi suatu wilayah termasuk penataan ruang. Tujuan penelitian ini adalah untuk menilai kondisi eksisting penataan ruang wilayah beberapa provinsi di Indonesia dalam konteks pemanfaatan dataset batas wilayah administrasi daerah. Metode yang digunakan adalah (1) interpretasi visual terhadap dataset batas wilayah (vektor) dengan peta lampiran perda RTRW Provinsi (raster); dan (2) topology check terhadap dataset batas wilayah (vektor) dengan peta RTRW Provinsi (vektor). Hasil penelitian tersebut menunjukkan bahwa beberapa wilayah tidak menggunakan dataset batas wilayah administrasi daerah yang legal untuk penyusunan peta rencana tata ruang yaitu ditunjukkan dengan adanya gap dan overlap antarinformasi. Kesimpulan yang dapat diambil dari hasil penelitian adalah fungsi koordinasi antarpemangku kepentingan dalam penataan ruang masih belum optimal.

APA, Harvard, Vancouver, ISO, and other styles

10

Paul, Shounak, Pawan Goyal, and Saptarshi Ghosh. "LeSICiN: A Heterogeneous Graph-Based Approach for Automatic Legal Statute Identification from Indian Legal Documents." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 11139–46. http://dx.doi.org/10.1609/aaai.v36i10.21363.

Full text

Abstract:

The task of Legal Statute Identification (LSI) aims to identify the legal statutes that are relevant to a given description of facts or evidence of a legal case. Existing methods only utilize the textual content of facts and legal articles to guide such a task. However, the citation network among case documents and legal statutes is a rich source of additional information, which is not considered by existing models. In this work, we take the first step towards utilising both the text and the legal citation network for the LSI task. We curate a large novel dataset for this task, including facts of cases from several major Indian Courts of Law, and statutes from the Indian Penal Code (IPC). Modeling the statutes and training documents as a heterogeneous graph, our proposed model LeSICiN can learn rich textual and graphical features, and can also tune itself to correlate these features. Thereafter, the model can be used to inductively predict links between test documents (new nodes whose graphical features are not available to the model) and statutes (existing nodes). Extensive experiments on the dataset show that our model comfortably outperforms several state-of-the-art baselines, by exploiting the graphical structure along with textual features.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Legal dataset"

1

Bensoussan, Jean-Claude. "Proposition d'une methodologie d'identification reconstructive anthropologique et odontologique : application a l'etude d'une serie de mandibules datant du 18eme siecle, la chapelle saint-esprit, 06 antibes." Lyon 1, 1990. http://www.theses.fr/1990LYO1DS02.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Raj, Rohit. "Towards Robustness of Neural Legal Judgement System." Thesis, 2023. https://etd.iisc.ac.in/handle/2005/6145.

Full text

Abstract:

Legal Judgment Prediction (LJP) implements Natural Language Processing (NLP) techniques to predict judgment results based on fact description. It can play a vital role as a legal assistant and benefit legal practitioners and regular citizens. Recently, the rapid advances in transformer- based pre-trained language models led to considerable improvement in this area. However, empirical results show that existing LJP systems are not robust to adversaries and noise. Also, they cannot handle large-length legal documents. In this work, we explore the robustness and efficiency of LJP systems even in a low data regime. In the first part, we empirically verify that existing state-of-the-art LJP systems are not robust. We further provide our novel architecture for LJP tasks which can handle extensive text lengths and adversarial examples. Our model performs better than state-of-the-art models, even in the presence of adversarial examples of the legal domain. In the second part, we investigate the approach for the LJP system in a low data regime. We further divide our second work into two scenarios depending on the number of unseen classes in the dataset which is being used for the LJP system. In the first scenario, we propose a few-shot approach with only two labels for the Judgement prediction task. In the second scenario, we propose an approach where we have an excessive number of labels for judgment prediction. For both approaches, we provide novel architectures using few-shot learning that are also robust to adversaries. We conducted extensive experiments on American, European, and Indian legal datasets in the few-shot scenario. Though trained using the few-shot approach, our models perform comparably to state-of-the-art models that are trained using large datasets in the legal domain.

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Legal dataset"

1

Escobar-Lemmon, Maria C., Valerie J. Hoekstra, Alice J. Kang, and Miki Caul Kittilson. Reimagining the Judiciary. Oxford University Press, 2021. http://dx.doi.org/10.1093/oso/9780198861577.001.0001.

Full text

Abstract:

This book examines the factors that facilitate women’s representation on high courts worldwide. Diverse courts improve collective decision-making, strengthen public confidence in the judiciary and judicial decisions, and broaden access to the judicial process. Taken together, domestic and international factors explain women’s representation. These influences include judicial pipelines, domestic institutions including selection processes, and international expectations about gender equity. These explanations are evaluated using an original dataset, which includes both men and women appointed to high courts in all regions of the world. Pathways and processes are examined in-depth through five case studies: Canada, Colombia, Ireland, South Africa, and the United States. Taking a multi-method approach, the book combines insights from a cross-national, time-serial dataset with case studies drawing on fieldwork. Women are being appointed to high courts in greater numbers across every region of the world, and political and legal institutions provide context for where the gains are earliest and strongest. The findings suggest a chain of favorable promoters for women’s representation on high courts: new norms of gender equality encourage the reimagining of the judiciary; advocacy organizations challenge the status quo; and windows of opportunity enable change.

APA, Harvard, Vancouver, ISO, and other styles

2

Berlin, Mark S. Criminalizing Atrocity. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198850441.001.0001.

Full text

Abstract:

Why do countries adopt criminal legislation making it possible to prosecute government and military officials for human rights violations? Over the past thirty years, dozens of countries have prosecuted their own or other states’ officials for past atrocities. Criminalizing Atrocity tells the story of the global spread of national criminal laws against atrocity crimes—genocide, war crimes, and crimes against humanity—laws that have helped pave the way for this remarkable trend toward greater accountability. It traces the early-twentieth-century origins of national atrocity laws to a group of influential European criminal law scholars and explains the global patterns by which they have since spread. The book shows that understanding why countries criminalize atrocities requires understanding how they do so. In many cases, criminalization has not been the result of concerted government initiative, but of inconspicuous choices made by technocratic legal experts who have been delegated authority to draft large-scale reforms to countries’ criminal codes. Drawing on research in comparative law and norm diffusion, Criminalizing Atrocity explains how such reform projects prompt technocratic drafters to select legal ideas, like atrocity laws, that have been endorsed by their professional communities and deemed by drafters to be important features of a “modern” criminal code. To test this argument, Criminalizing Atrocity draws on a range of original quantitative and qualitative data, including in-depth case studies of Guatemala, Colombia, Poland, and the Maldvies, and a new, comprehensive dataset tracking the global spread of atrocity laws since Word War II. The book’s findings highlight the importance of professional communities in the modern renaissance of atrocity justice and the domestication of international legal norms.

APA, Harvard, Vancouver, ISO, and other styles

3

Ovodenko, Alexander. Producers, Trade Groups, and the Design of Global Environmental Regimes. Oxford University Press, 2017. http://dx.doi.org/10.1093/oso/9780190677725.003.0006.

Full text

Abstract:

The chapter provides a macro-level analysis of the legalization, standardization, and integration of global environmental rules. The statistical tests rely on two new datasets on global treaty regimes and business stakeholders in those regimes. The results demonstrate that treaty regimes that regulate oligopolistic industries tend to become integrated over time with protocols, amendments, and similar agreements that add new rules or institutions to the international regime. They also consist of legally binding agreements, not soft law commitments by parties, and standardized rules applicable to all member states or categories of member states. By contrast, treaty regimes that regulate competitive markets tend to become more disintegrated (or unintegrated) over time. These international regimes are also legal hybrids because they consist of hard and soft law, and often give countries the responsibility to make nationally specific commitments. Producer-level concentrations significantly constrain the design of global environmental treaty regimes.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Legal dataset"

1

de Vargas Feijó, Diego, and Viviane Pereira Moreira. "RulingBR: A Summarization Dataset for Legal Texts." In Lecture Notes in Computer Science, 255–64. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99722-3_26.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Stellato, Armando, Manuel Fiorelli, Andrea Turbati, Tiziano Lorenzetti, Peter Schmitz, Enrico Francesconi, Najeh Hajlaoui, and Brahim Batouche. "Dataset Alignment and Lexicalization to Support Multilingual Analysis of Legal Documents." In Lecture Notes in Computer Science, 257–71. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00178-0_17.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Luz de Araujo, Pedro Henrique, Teófilo E. de Campos, Renato R. R. de Oliveira, Matheus Stauffer, Samuel Couto, and Paulo Bermejo. "LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text." In Lecture Notes in Computer Science, 313–23. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99722-3_32.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Ates, Leyla, Moran Harari, and Markus Meinzer. "Negative Spillovers in International Corporate Taxation and the European Union." In Taxation, International Cooperation and the 2030 Sustainable Development Agenda, 195–217. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-64857-2_10.

Full text

Abstract:

AbstractJurisdictions can engage in different types of aggressive tax policies to varying degrees. These policies can have negative spillover effects on other jurisdictions. In the realm of corporate taxation, these effects consist of base erosion and profit shifting and perceived pressures to reduce corporate taxes. Both direct and indirect effects undermine the efforts especially of developing countries at mobilising domestic resources to achieve the Sustainable Development Goals. We analyse the intensity of corrosive tax policies by exploiting a new legal dataset compiled for the Corporate Tax Haven Index (CTHI). Relying on rigorously defined indicators, the dataset allows comparative analyses of negative and positive spillover pathways in the corporate income tax systems of 64 jurisdictions. Tax policies under review comprise, for example, preferential tax regimes, extremely low tax rates agreed through secretive tax rulings, economic zones and tax holidays. Comparing the 27 European Union (EU) member states with five African developing countries, we find important differences. Except for two indicators (loss utilisation and economic zones/tax holidays), the European Union members are found to consistently engage in more aggressive corporate tax policies than the African countries. These heightened risks for negative spillovers emanating from the EU27 corporate tax rules stand in conflict with the stated intentions by the European Union to support good governance in tax matters and its commitment to ensure policy coherence for development. The chapter provides recommendations on how to reduce the risks for negative spillovers in corporate taxation and to exit the race to the bottom in corporate taxation.

APA, Harvard, Vancouver, ISO, and other styles

5

Baltes, Sebastian. "Software Developers’ Work Habits and Expertise: Empirical Studies on Sketching, Code Plagiarism, and Expertise Development." In Ernst Denert Award for Software Engineering 2019, 47–60. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58617-1_4.

Full text

Abstract:

AbstractAnalyzing and understanding software developers’ work habits and resulting needs is an essential prerequisite to improve software development practice. In our research, we utilize different qualitative and quantitative research methods to empirically investigate three underexplored aspects of software development: First, we analyze how software developers use sketches and diagrams in their daily work and derive requirements for better tool support. Then, we explore to what degree developers copy code from the popular online platform Stack Overflow without adhering to license requirements and motivate why this behavior may lead to legal issues for affected open source software projects. Finally, we describe a novel theory of software development expertise and identify factors fostering or hindering the formation of such expertise. Besides, we report on methodological implications of our research and present the open dataset SOTorrent, which supports researchers in analyzing the origin, evolution, and usage of content on Stack Overflow. The common goal for all studies we conducted was to better understand software developers’ work practices. Our findings support researchers and practitioners in making data-informed decisions when developing new tools or improving processes related to either the specific work habits we studied or expertise development in general.

APA, Harvard, Vancouver, ISO, and other styles

6

Lima, João Pedro, and José Alfredo Costa. "Comparing Clustering Techniques on Brazilian Legal Document Datasets." In Lecture Notes in Computer Science, 98–110. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-15471-3_9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Zhang, Gechuan, Paul Nulty, and David Lillis. "A Decade of Legal Argumentation Mining: Datasets and Approaches." In Natural Language Processing and Information Systems, 240–52. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-08473-7_22.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Arranz, Victoria, Khalid Choukri, Valérie Mapelli, Mickaël Rigault, Penny Labropoulou, Miltos Deligiannis, Leon Voukoutis, and Stelios Piperidis. "Datasets, Corpora and other Language Resources." In European Language Grid, 151–69. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-17258-8_8.

Full text

Abstract:

AbstractThis chapter provides an overview of what is available in ELG in terms of datasets, corpora and other language resources (LRs) and how this has been achieved. We look at the procedures and steps that have been followed to complete the full resource ingestion cycle, which goes from repository and LR identification to metadata description and ingestion. We explain the approaches, priorities and methodology. The chapter also outlines the repositories that have been integrated into ELG, discussing the different procedures followed (metadata conversion, extraction, and completion, as well as harvesting) and the reasons behind these choices. Furthermore, the ELG catalogue content is described, with details on key elements and features as well as accomplishments. The last two sections are devoted to the crucial legal issues behind such a complex platform and its data management plan, respectively.

APA, Harvard, Vancouver, ISO, and other styles

9

Parkhimovich, Olga, and Daria Gritsenko. "Open Government Data in Russia." In The Palgrave Handbook of Digital Russia Studies, 389–407. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-42855-6_22.

Full text

Abstract:

AbstractThis chapter provides a brief overview of the history and current state of open government data in Russia. First, it discusses the concept of “open data” and defines the basic principles of open government data. It further describes the institutional, legal, and infrastructural frameworks for the development of open government data in Russia. The chapter discusses the main sources of open data, the availability of key datasets, and the current situation around future development of the open data agenda in Russia. Finally, it provides examples of projects and cases of interaction with government agencies based on open data.

APA, Harvard, Vancouver, ISO, and other styles

10

Bonura, Susanna, Davide dalle Carbonare, Roberto Díaz-Morales, Ángel Navia-Vázquez, Mark Purcell, and Stephanie Rossello. "Increasing Trust for Data Spaces with Federated Learning." In Data Spaces, 89–106. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98636-0_5.

Full text

Abstract:

AbstractDespite the need for data in a time of general digitization of organizations, many challenges are still hampering its shared use. Technical, organizational, legal, and commercial issues remain to leverage data satisfactorily, specially when the data is distributed among different locations and confidentiality must be preserved. Data platforms can offer “ad hoc” solutions to tackle specific matters within a data space. MUSKETEER develops an Industrial Data Platform (IDP) including algorithms for federated and privacy-preserving machine learning techniques on a distributed setup, detection and mitigation of adversarial attacks, and a rewarding model capable of monetizing datasets according to the real data value. The platform can offer an adequate response for organizations in demand of high security standards such as industrial companies with sensitive data or hospitals with personal data. From the architectural point of view, trust is enforced in such a way that data has never to leave out its provider’s premises, thanks to federated learning. This approach can help to better comply with the European regulation as confirmed from a legal perspective. Besides, MUSKETEER explores several rewarding models based on the availability of objective and quantitative data value estimations, which further increases the trust of the participants in the data space as a whole.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Legal dataset"

1

Wrzalik, Marco, and Dirk Krechel. "GerDaLIR: A German Dataset for Legal Information Retrieval." In Proceedings of the Natural Legal Language Processing Workshop 2021. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021. http://dx.doi.org/10.18653/v1/2021.nllp-1.13.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Quevedo, Ernesto, Mushfika Rahman, Tomas Cerny, Pablo Rivas, and Gissella Bejarano. "Study of Question Answering on Legal Software Document using BERT based models." In LatinX in AI at North American Chapter of the Association for Computational Linguistics Conference 2022. Journal of LatinX in AI Research, 2022. http://dx.doi.org/10.52591/lxai202207103.

Full text

Abstract:

The transformer-based architectures have achieved remarkable success in several Natural Language Processing tasks, such as the Question Answering domain. Our research focuses on different transformer-based language models’ performance in software development legal domain specialized datasets for the Question Answering task. It compares the performance with the general-purpose Question Answering task. We have experimented with the PolicyQA dataset and conformed to documents regarding users’ data handling policies, which fall into the software legal domain. We used as base encoders BERT, ALBERT, RoBERTa, DistilBERT and LEGAL-BERT and compare their performance on the Question answering benchmark dataset SQuAD V2.0 and PolicyQA. Our results indicate that the performance of these models as contextual embeddings encoders in the PolicyQA dataset is significantly lower than in the SQuAD V2.0. Furthermore, we showed that surprisingly general domain BERT-based models like ALBERT and BERT obtain better performance than a more domain-specific trained model like LEGAL-BERT.

APA, Harvard, Vancouver, ISO, and other styles

3

Yao, Feng, Chaojun Xiao, Xiaozhi Wang, Zhiyuan Liu, Lei Hou, Cunchao Tu, Juanzi Li, Yun Liu, Weixing Shen, and Maosong Sun. "LEVEN: A Large-Scale Chinese Legal Event Detection Dataset." In Findings of the Association for Computational Linguistics: ACL 2022. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.findings-acl.17.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Tonguz, Ozan, Yiwei Qin, Yimeng Gu, and Hyun Hannah Moon. "Automating Claim Construction in Patent Applications: The CMUmine Dataset." In Proceedings of the Natural Legal Language Processing Workshop 2021. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021. http://dx.doi.org/10.18653/v1/2021.nllp-1.21.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Ma, Yixiao, Yunqiu Shao, Yueyue Wu, Yiqun Liu, Ruizhe Zhang, Min Zhang, and Shaoping Ma. "LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System." In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3404835.3463250.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Chalkidis, Ilias, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Katz, and Nikolaos Aletras. "LexGLUE: A Benchmark Dataset for Legal Language Understanding in English." In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.acl-long.297.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Benatti, Raysa M., Camila M. L. Villarroel, Sandra Avila, Esther L. Colombini, and Fabiana Severi. "Should I disclose my dataset? Caveats between reproducibility and individual data rights." In Proceedings of the Natural Legal Language Processing Workshop 2022. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.nllp-1.20.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Sovrano, Francesco, Monica Palmirani, Biagio Distefano, Salvatore Sapienza, and Fabio Vitali. "A dataset for evaluating legal question answering on private international law." In ICAIL '21: Eighteenth International Conference for Artificial Intelligence and Law. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3462757.3466094.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Pal, Ankit. "DeepParliament: A Legal domain Benchmark & Dataset for Parliament Bills Prediction." In Proceedings of the Workshop on Unimodal and Multimodal Induction of Linguistic Structures (UM-IoS). Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.umios-1.8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lin, Chun-Hsien, and Pu-Jen Cheng. "An Evaluation Dataset for Legal Word Embedding: A Case Study on Chinese Codex." In 11th International Conference on Embedded Systems and Applications (EMSA 2022). Academy and Industry Research Collaboration Center (AIRCC), 2022. http://dx.doi.org/10.5121/csit.2022.120614.

Full text

Abstract:

Word embedding is a modern distributed word representations approach and widely used in many natural language processing tasks. Converting the vocabulary in a legal document into a word embedding model facilitates subjecting legal documents to machine learning, deep learning, and other algorithms and subsequently performing the downstream tasks of natural language processing vis-à-vis, for instance, document classification, contract review, and machine translation. The most common and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation. This paper proposes establishing an 1,134 Legal Analogical Reasoning Questions Set (LARQS) from the 2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be ubiquitous in the word embedding model.

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Legal dataset"

1

Gwartney, James, Robert Lawson, Joshua Hall, and Ryan Murphy. Economic Freedom of the World: 2022 Dataset for Researchers. Fraser Institute, 2022. http://dx.doi.org/10.53095/88975003.

Full text

Abstract:

Dataset for Researchers of the Economic Freedom of the World Annual Report that measures the degree to which the policies and institutions of countries are supportive of economic freedom. The cornerstones of economic freedom are personal choice, voluntary exchange, freedom to enter markets and compete, and security of the person and privately owned property. Forty-two data points are used to construct a summary index, along with a Gender Legal Rights Adjustment to measure the extent to which women have the same level of economic freedom as men. The degree of economic freedom is measured in five broad areas: Size of Government, Legal System and Property Rights, Sound Money, Freedom to Trade Internationally, and Regulation.

APA, Harvard, Vancouver, ISO, and other styles

2

Gwartney, James, Robert Lawson, Joshua Hall, and Ryan Murphy. Economic Freedom of the World: 2022 Dataset by Country. Fraser Institute, 2022. http://dx.doi.org/10.53095/88975002.

Full text

Abstract:

Dataset by Country of the Economic Freedom of the World Annual Report that measures the degree to which the policies and institutions of countries are supportive of economic freedom. The cornerstones of economic freedom are personal choice, voluntary exchange, freedom to enter markets and compete, and security of the person and privately owned property. Forty-two data points are used to construct a summary index, along with a Gender Legal Rights Adjustment to measure the extent to which women have the same level of economic freedom as men. The degree of economic freedom is measured in five broad areas: Size of Government, Legal System and Property Rights, Sound Money, Freedom to Trade Internationally, and Regulation.

APA, Harvard, Vancouver, ISO, and other styles

3

Stansel, Dean, José Torra, Fred McMahon, and Ángel Carrión-Tavárez. Economic Freedom of North America 2022 Full Dataset. Fraser Institute, 2022. http://dx.doi.org/10.53095/88975008.

Full text

Abstract:

Full dataset of the Economic Freedom of North America that measures the extent to which the policies of individual provinces and states are supportive of economic freedom—the ability of individuals to act in the economic sphere free of undue restrictions. It includes a subnational index for comparison of individual jurisdictions (provincial/state and municipal/local governments) within the same country, and an all-government index for comparison of jurisdictions (federal governments) in different countries. For the subnational index, Economic Freedom of North America employs 10 variables for the 92 provincial/state governments in Canada, the United States, and Mexico in three areas: (1) Government Spending, (2) Taxes, and (3) Regulation. In the case of the all-government index, we incorporate three additional areas at the federal level from Economic Freedom of the World Annual Report: (4) Legal Systems and Property Rights, (5) Sound Money, and (6) Freedom to Trade Internationally. In addition, we expand area 1 to include government investment, area 2 to include top marginal income and payroll tax rates, and area 3 to include credit market regulation and business regulations. These additions help capture restrictions on economic freedom that are difficult to measure at the provincial/state and municipal/local level.

APA, Harvard, Vancouver, ISO, and other styles

4

Stansel, Dean, José Torra, Fred McMahon, and Ángel Carrión-Tavárez. Economic Freedom of North America 2022 Dataset-All Government. Fraser Institute, 2022. http://dx.doi.org/10.53095/88975007.

Full text

Abstract:

Dataset of the all-government index of the Economic Freedom of North America for comparison of jurisdictions (federal governments) in different countries. The Economic Freedom of North America measures the extent to which the policies of individual provinces and states are supportive of economic freedom—the ability of individuals to act in the economic sphere free of undue restrictions. The all-government index employs 10 variables for the 92 provincial/state governments in Canada, the United States, and Mexico in three areas: (1) Government Spending, (2) Taxes, and (3) Regulation. Also, we incorporate three additional areas at the federal level from Economic Freedom of the World Annual Report: (4) Legal Systems and Property Rights, (5) Sound Money, and (6) Freedom to Trade Internationally. In addition, we expand area 1 to include government investment, area 2 to include top marginal income and payroll tax rates, and area 3 to include credit market regulation and business regulations. These additions help capture restrictions on economic freedom that are difficult to measure at the provincial/state and municipal/local level.

APA, Harvard, Vancouver, ISO, and other styles

5

Hudgens, Bian, Jene Michaud, Megan Ross, Pamela Scheffler, Anne Brasher, Megan Donahue, Alan Friedlander, et al. Natural resource condition assessment: Puʻuhonua o Hōnaunau National Historical Park. National Park Service, September 2022. http://dx.doi.org/10.36967/2293943.

Full text

Abstract:

Natural Resource Condition Assessments (NRCAs) evaluate current conditions of natural resources and resource indicators in national park units (parks). NRCAs are meant to complement—not replace—traditional issue- and threat-based resource assessments. NRCAs employ a multi-disciplinary, hierarchical framework within which reference conditions for natural resource indicators are developed for comparison against current conditions. NRCAs do not set management targets for study indicators, and reference conditions are not necessarily ideal or target conditions. The goal of a NRCA is to deliver science-based information that will assist park managers in their efforts to describe and quantify a park’s desired resource conditions and management targets, and inform management practices related to natural resource stewardship. The resources and indicators emphasized in a given NRCA depend on the park’s resource setting, status of resource stewardship planning and science in identifying high-priority indicators, and availability of data and expertise to assess current conditions for a variety of potential study resources and indicators. Puʻuhonua o Hōnaunau National Historical Park (hereafter Puʻuhonua o Hōnaunau NHP) encompasses 1.7 km2 (0.7 mi2) at the base of the Mauna Loa Volcano on the Kona coast of the island of Hawaiʻi. The Kona coast of Hawaiʻi Island is characterized by calm winds that increase in the late morning to evening hours, especially in the summer when there is also a high frequency of late afternoon or early evening showers. The climate is mild, with mean high temperature of 26.2° C (79.2° F) and a mean low temperature of 16.6° C (61.9° F) and receiving on average 66 cm (26 in) of rainfall per year. The Kona coast is the only region in Hawaiʻi where more precipitation falls in the summer than in the winter. There is limited surface water runoff or stream development at Puʻuhonua o Hōnaunau NHP due to the relatively recent lava flows (less than 1,500 years old) overlaying much of the park. Kiʻilae Stream is the only watercourse within the park. Kiʻilae Stream is ephemeral, with occasional flows and a poorly characterized channel within the park. A stream gauge was located uphill from the park, but no measurements have been taken since 1982. Floods in Kiʻilae Stream do occur, resulting in transport of fluvial sediment to the ocean, but there are no data documenting this phenomenon. There are a small number of naturally occurring anchialine pools occupying cracks and small depressions in the lava flows, including the Royal Fishponds; an anchialine pool modified for the purpose of holding fish. Although the park’s legal boundaries end at the high tide mark, the sense of place, story, and visitor experience would be completely different without the marine waters adjacent to the park. Six resource elements were chosen for evaluation: air and night sky, water-related processes, terrestrial vegetation, vertebrates, anchialine pools, and marine resources. Resource conditions were determined through reviewing existing literature, meta-analysis, and where appropriate, analysis of unpublished short- and long-term datasets. However, in a number of cases, data were unavailable or insufficient to either establish a quantitative reference condition or conduct a formal statistical comparison of the status of a resource within the park to a quantitative reference condition. In those cases, data gaps are noted, and comparisons were made based on qualitative descriptions. Overall, the condition of natural resources within Puʻuhonua o Hōnaunau NHP reflects the surrounding landscape. The coastal lands immediately surrounding Puʻuhonua o Hōnaunau NHP are zoned for conservation, while adjacent lands away from the coast are agricultural. The condition of most natural resources at Puʻuhonua o Hōnaunau NHP reflect the overall condition of ecological communities on the west Hawai‘i coast. Although little of the park’s vegetation...

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!