Legal Clause Classification Dataset

Legal Clause Classification Dataset built from various sources like multiple contracts, online contract texts etc and Label them into 24 categories. The sole purpose of this dataset is to identify any given contract text into one of the Clause labels. Although the pre-defined categories can be customized according to the user requirements and same goes for the dataset contract text.



Save Time

Instead of labeling your data manually, use Predictly’s customizable legal clause identification dataset to train a model which can identify legal clauses in a contract quickly.

Increase model accuracy

All contract sentences have been annotated by professional lawyers which makes it a highly accurate and balanced dataset in the branch of contract analysis and with the right machine learning model you can achieve 90% and beyond accuracy.

Quick Integration

The dataset is available with a fully customizable option which allows you to fuse any clause label according to your requirement and annotate it. It provides rapid integration to consume the dataset easily and efficiently.

We often see hundreds of contracts or tenders coming up for review. In law firms we usually come across many tenders and their described contents and the first thing we do is to identify from which clause it comes from. We use the prescribed playbook to look for the suitable clause for the described content or sentence. And this actually comes at a cost of time, we spend a lot of time just looking through one contract or tender. Then just imagine the time it will take to analyze or review hundreds of contracts or tenders. That’s where we need to automate this and how do we automate this, by building an automated contract review system. In that automated contract review system we will have several machine learning models like clause identification, risk assessment (level of risk associated with) and whether it’s compliance with the prescribed playbook or not. We will not cover all of them in this story, we will only go through the clause identification.


Usage Methods

  • E-mail
  • Cloud Storage
  • Cloud Bucket

Data Update Frequency​

  • On-demand
  • Monthly
  • Quarterly

Dataset Export Formats

  • CSV
  • JSON
  • TXT
  • EXCEL Sheet
  • XML

Use Cases


Contract Risk Analysis

  1. This dataset can be used as a part of the entire automation solution for contract reviewing. We can start with identifying the input contract and categorize them into its respective clauses, which can be further used for risk level calculation and finding out the suitable rules that are related to the selected contract text.
  2. For more information about how contract risk analysis works in an automated way, read our case study in which we explained how we manage to save tons of time for lawyers by making the contract review process automated.



Legal Clause Classification

  • Interested in the trained model of this dataset, explore our catalog for it’s trained model and many more like this

Enquire Now​

    Additional information
    Number of sentences


    Number of labels


    Input Type


    Dataset Type



    Audit, Business Conduct, Compliance with HSS and Environment, Confidentiality, Data Protection, Force Majeure, General Conditions, Import Export Law, Insurance, Intellectual Property Infringement, Intellectual Property Ownership, Liability and Indemnity, Limitation of Liability, Obligations of Company, Obligations of Contractor, Payment Terms, Suspension, Termination, Taxes and Duties, Warranties, Word Order and Change Order.


    Be the first to review “Legal Clause Classification Dataset”

    Your email address will not be published.

    Your rating: