Course 2
Using and Training AI

Unit 1 – Working with a Large Language Model (LLM) chatbot

Lesson 1: Where LLMs come from

We will be using AI programs that generate text in response to what you type. These are often called chatbots. From course 1 we know that AI programs can be trained on huge datasets. LLM programs - Large Language Models - are trained on huge amounts of written articles, news sources, forums, code and essays so that they can respond to many different types of questions.

Most of the time we don’t know the details of the data used to train AI programs.

If we ask one of the most successful chatbot programs, ChatGPT, "where does your training data come from?", it responds with:

“As an AI language model, I don’t have access to my training data, but I was trained on a mixture of licensed data, data created by human trainers, and publicly available data. OpenAI, the organization behind my development, has not publicly disclosed the specifics of the training datasets used.”

If you ask some more questions, a chatbot or a search engine can tell you that common datasets are things like:

IMDb Movie Reviews
Wikipedia Articles
Reuters News Dataset

…and also datasets that have been construced by large teams of researchers, such as the Stanford Question Answering Dataset (SQuAD). This dataset was made by taking 536 popular wikipedia articles, splitting them into paragraphs, then asking humans to create questions and answers from those paragraphs. This produced organised (labelled) data that AI programs could use to create a model for responses.

Before we dive into chatting with AI, we can do a quick jargon busting on AI terms.

LLM: A Large Language Model (LLM) is an AI system designed to understand, process, and generate human-like text. It is based on a large dataset of language data. These models use deep learning techniques and neural networks to learn patterns and structures within language. They can perform tasks such as language translation, text generation and question answering.

Attention: A way for AI programs to focus on specific parts of input data when making predictions or generating output. It enables the model to give importance to different parts of the input sequence.

Context window: This is used by AI programs to understand the meaning of a word based on the surrounding words. For instance, in a context window size of 5, the word "dog" in a sentence could use the four words before and after "dog," to understand the meaning of "dog" within the sentence.

DEMONSTRATION

Chat with our Course 1 chatbot, who we have trained with all the knowledge from course 1!

Next Lesson

Course 2 Using and Training AI

Unit 1 – Working with a Large Language Model (LLM) chatbot

Lesson 1: Where LLMs come from

Course 2
Using and Training AI