OPEN

ai matters 1: Technology introduction

While it makes sense to have a basic understanding of the technology we are talking about here, not only from the user or system perspective but from a backend or algorithmic point of view, there are already many excellent resources that outline these technologies from various levels of abstraction. And though I worked with artificial intelligence as a patent attorney back in the late twenty-teens, I also worked with enough brilliant minds in the area to understand there are many who know this area much better than I do.[1]  However, if you have worked in this area or already have any basic level of awareness, feel free to skip to the next section.

One of those brilliant minds who know the area much better than I do is Michael Copeland, quoted here on the Nvidia blog outlining a helpful lens to understand the distinctions. “The easiest way to think of their relationship is to visualize them as concentric circles with AI — the idea that came first — the largest, then machine learning — which blossomed later, and finally deep learning — which is driving today’s AI explosion — fitting inside both.”[2]

As indicated above, machine learning is a subset of artificial intelligence. More specifically, machine learning uses algorithms in a way that enables the model to learn from data over time to improve. There are a variety of ways to break down machine learning models, including supervised learning (using labeled data), unsupervised learning (using raw data), semi-supervised learning (including a mix of both structured and unstructured data),  and reinforced learning.

Crucially, data is the “source” of the learning for the model. In other words, a model is only as good as its data. This simple fact has led to a bulk of the copyright issues we will see arising from generative AI, because the best models use vast and diverse amounts of data. A quick and easy way to access huge quantities of data is through data scraping (“scraping”), a process which uses automated tools or scripts to extract data from various online sources, including websites.

What we refer to as “generative AI” is often a combination of multiple models, combined with or embedded in a larger system that also includes a user interface and often a web-based application. The user often interacts with a generative AI system through the receipt of a user input or a prompt. The user or other input is often text-based, but it can be of one or more of many different modalities (text, image, audio, etc), and likewise the output can be in one or more formats, including text, image, audio, video and the like (e.g., in text-to-text models, the input is text and the output is text, in text-to-image models, the input is text, the output is image-based). For anyone who has used ChatGPT, none of this is new. However, it is important to clarify that inputs into the generative AI system (e.g., from the user) just described differ from the training inputs provided to the particular model or models (e.g., for training each model), as both raise questions of latter are the subject of much litigation that will be discussed herein.

Each model needs to be trained, even when there are many models comprising a single system or “generative AI” tool. One way to conceptualize the model, through a more functional lens, to bring continuity amongst the diversity of generative AI models is described in James Grimmelmann’s forthcoming piece[3]:

“To create a generative-AI model, its creator picks a technical architecture, assembles training datasets (ed.note – not just “data”, but carefully constructed datasets), and then runs a training algorithm to encode features of the training data in the model.” This base model must then be fine-tuned and deployed as part of a service.

With these touchpoints in mind, we will pivot to begin to survey the plethora of legal and IP issues the broad application and adoption of this technology cluster called generative AI has begun to generate.

One note, this article series, for better or worse, has not been written with the help of generative AI. Whether that proves to be a wise or good use of time remains an open question.

[1]Some excellent and accessible resources include:

  1. For a quick read: “What ‘s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning” https://blogs.nvidia.com/blog/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
  2. For a deeper dive, IBM has created a portal of free course for the basics: https://www.ibm.com/training/collection/generative-ai-with-ibm-687

[2] Michael Copeland, “What is the Difference between Artificial Intelligence, Machine Learning, and Deep Learning. July 29, 2016. https://blogs.nvidia.com/blog/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/

[3] If you are interested in these issues, in particular the intersections of generative AI and copyright issues, I can recommend James Grimmelmann’s piece, written with Katherin Lee and A. Feder Cooper, “Talking’ ‘Bout AI Generation: Copyright and the Generative-AI Supply Chain) Journal of the Copyright Society of the U.S.A. (forthcoming), accessed here: https://james.grimmelmann.net/files/articles/talkin-bout-ai-generation.pdf

However, it is over 150 pages, so in case you’d have other ways to spend your evening, we’ve read it for you.