Training AI – Scraping, TDM and the Law
The development and use of AI in a variety of areas, and particularly the commercial context, continues apace. AI has many uses including data modelling, decision-making, and generating content. As such, it is fast becoming an important business tool in many settings. As useful and oftentimes convenient as AI can be, however, its use is not without risks or harms.
Many AI models are trained using large amounts of data or content including text, images, and video. The problem arises from the source of much of this content as it is collected en masse or “scraped” from the internet. Such content is, in many cases, protected by intellectual property rights – rights which are infringed by such unauthorised use. Text and data mining (“TDM”) is a technique that is often used in such training, although it should be noted that TDM and AI model training are separate activities with the former often necessarily preceding the latter.
Content owners face having their valuable IP used without their consent and without payment and, in turn, AI developers are risking IP infringement actions being brought against them. Not only that, but end users using AI (particularly generative AI) may also risk infringing the IP rights of the original content owners whose material has been used for training.
In our latest AI blog post, we take a look at the key issues involved and the potential solutions including licence agreements, exceptions enshrined in law, and technical and other measures that content owners and rightsholders can take to help to reduce the impact on their content and business.
The contents of this Newsletter are for reference purposes only and do not constitute legal advice. Independent legal advice should be sought in relation to any specific legal matter.