Introduction

AI-Tools are fundamentally based on an AI-model, which is an algorithm that learns from data to give the answer to the question it is designed for. It sounds complicated, but it can be summarized as a pre-built piece of software (neural network or other type of machine learning, often easily available on the internet), commonly referred to as “algorithm”, which is trained or calibrated with available data in order to answer the question it has been trained on. Depending on available data, it is possible to get insight on hidden patterns, predict situations, create an intelligent ambient able to sense and act upon its surroundings, or support workflows by automating standard tasks. All this require some prior consideration on the data that will be used for the implementation of AI-Tools.

- Is personal information collected legally?
- Is personal information used for an automated decision or profiling?
- Do you have a strategy on the property of the data?
- Do you have the data you need?
- Is the data representative of your use case (quality of the data)?

Legal considerations

Data Protection

AI-Tools and “Big Data” (the great amount of information that is today possible to collect) caused a regulatory backlash aimed to protect personal information. The use and collection of personal information is strictly regulated in, among others, Europe and Switzerland and these standards have implications around the world.

When information is considered personal varies from situation to situation. When a company has data that allows it without unreasonable effort to associate the data to a person, or this data is collected or processed in order to associate it to a specific person, then this constitutes personal information.[1]

When collecting data that constitutes personal information, it must be made sure that this information is collected legally. This usually means that the person from which the data is collected should give its explicit consent before the data is collected. In specific cases there is the possibility to collect and use personal data without explicit informed consent, if there is a legal provision allowing this (such as the collection and use of the contracting party’s data for the purposes of fulfilling a contract). Because this data is usually also used for marketing purposes, however, an explicit consent is often needed for this additional purpose. Even in situations where the explicit consent is not required, its collection constitutes best practice.

The use of the collected information must also be legal. This means that the required explicit consent must also be informed. Informed consent means that the person must agree on the type of information collected and on the use of the collected data. It must therefore be described in advance in relatively detailed ways, how the personal information will be used. Even if this data is then anonymized, and therefore will not constitute personal information afterwards, the person must be informed in order to be able to ensure that the correct data has been collected and to be able to control is the data has been correctly handled (e.g., deleted when not needed anymore).

When personal data is used to personalize a service or make personalized suggestion, then the result of the elaboration, which constitutes a “subjective information” on a person, must be made accessible to the data subject since they still constitute personal information.[2]

Furthermore, if personal data is used for the performance of an automated decision or profiling which has relevant effects for the impacted person (e.g., credit scoring, bonus, hiring and promotion, access to groups or locations, etc.), it must be remembered that this person has the right to object and ask for a review by a human.

Intellectual Property

In principle intellectual property on algorithms used for the creation of models is granted internationally by the Berne Convention (as it constitutes a “literary” work) and are regulated by the licenses attached to them.

The raw data supplied to train the model is on the other hand usually not protected by intellectual property, unless it has already some protection as a literal or artistic work (such as literary texts, news articles and artistic pictures). There is currently a debate on the question if some specific protection should be created for data collection that can be used for the training of algorithms. The prevailing idea, for the moment, is that other existing legal protection options, such as rules protecting confidential information and competition rules, or contractual agreements, grant sufficient legal protection to this kind of data.[3]  

The product of an AI-algorithm is also excluded from intellectual protection rights, which is controversial for products such as pictures, audio and texts produced by AI that are comparable with works done by humans. This because the protection is usually granted to “original” (non-obvious) work, and originality is deemed to be derivable only from human activity.[4]  

Technical considerations

First of all: prepare the training data. This involves different steps that will allow to successfully obtain the desired results. Crucially, it must be checked that data is available: the data must be available in a format that the algorithm can understand (such as electronic texts, pictures, and numbers), as well as in sufficient quantity. If this is not yet available, it is possible to bulk-digitize existing information thanks to specialized providers. It is important that this digitization of the data is coupled with the digitization of the work flow, so that there is no need in the future to go through the same exercise. Because of the wave in home office conditions, at least the most basic digitization of the work has been done by everyone.

The number of information necessary for an implementation of AI-Tools to be useful varies from case to the other. A lot of pre-made solutions have been created, so that there are already models trained in such a way that they can be implemented with little or no data available. This is especially the case for basic automation solutions such as information extraction from standard documents and forms. Many companies providing algorithms also offer a consultation on the available options and tools based on the data available.

Another aspect to consider is that the quality of the data should be reviewed: it must in fact ensure that the training data is diverse enough so that it reflects the envisaged used. If only a particular part of the use cases has been used as training data, the results will be biased by what applies to this subset. It can be useful, for example, to structure the algorithm in such a way that it provides a transparent view on the criteria used for reaching a conclusion. This helps in identifying and addressing eventual flaws in the data provided.

Finally, independent providers (such as IBM and LatticeFlow) have emerged that offer services connected to the review and auditing of an AI-Tool, allowing for a better transparency and reliability. Thanks to improvement in the technology, it is possible to shine more and more light in what was once a “black box”.

[1] David Rosenthal, Das neue Datenschutzgesetz, in: Jusletter 16. November 2020, N.19; Whereas 26 of REGULATION (EU) 2016/679 (GDPR).

[2] Opinion 4/2007 on the concept of personal data, p. 9 ss.

[3] At European level it must be noticed the existence of the Database Directive, which gives copyright protection to naturals person for database structures created (e.g., the structure of relational databases as intellectual creation), but not their content.

[4] Daniel Gervais, Exploring the Interfaces Between Big Data and Intellectual Property Law, 10 (2019), JIPITEC 3 para 21.