You have maybe have heard of known software licenses such as MIT, Apache & GPL, but there are especially for software purposes, and data have their own licensing conventions. But they are not that hard to remember, actually, there even are a few tools to help you select the right license for your data, no worries!
A data license is a legal arrangement between the provider and the consumer of the dataset, mentioning the rights that the consumer can or cannot do with the provided data.
Licenses answer to all kinds of needs, from the public domain data that is the most permissive licensing status to comme commercial and confidential ones. Let's dive into that.
Many licensing providers exist but do not worry, there is a limited amount of licenses. First of all, you need to know the data criteria in order to be able to choose the right license.
Consumers can copy the data in order to use it or should they leave no use of it within the projets. Meaning the data must only be used for one-shot purposes and never saved, or through an API which no storage of any kind, mainly using the streamed API data on-the-go.
Consumers can share the source material and reproduce and share adapted material, with the exceptions and limitations depending on the presence of the other criteria.
Providers want their name and copyrights to be mentioned within the project if the data is used.
Consumers must provide the name of the creator and parties, and - if provided - notices (copyright, license, disclaimer) and a link to the material. Consumers also must mention any changes to the original dataset if changes have been made. See the Adaptation part to better understand what goes into "changes".
Consumers can use the data for commercial purposes and monetizing upon it.
Consumers can't change, rework or transform the dataset. Modifications on a dataset are understood as changes when the dataset has enough modifications to create a new intellectual copyright. For example, changing the date format will never be understood as an adaptation. However, creating data corresponding to your needs, adds an intellectual property upon the used dataset and it considered as an adaptation.
If consumers perform any modification considered like an adaptation on the dataset, they can only redistribute their adaptations under the same source material's licensing. Consumers can also adapt the license to a compatible one.
With all of the above criteria you are now fully equipped to select which licence is made for your data.
Here is the table of the actual data licenses that should allow you to pick the right one:
For example, a dataset licensed as CC BY can be distributed, remixed, adapted and built upon, commercially or not, as long as the consumers give credit to the providers for the original creation.
Alternatively, a dataset licensed as CC BY-NC-ND is the most restrictive licensing, only allowing consumers to download the data and share it with others, alongside the credits given to the providers, but they consumers can't change the data in any way nor use it commercially.