How does Chat GPT communicate so effortlessly and what can we learn from it in terms of language acquisition?
Many of us have been using Chat GPT for language learning for just over half a year now, and either it has completely revolutionised your work life or, on the other side of the spectrum, maybe you got it to say something funny in a Shakespearian style for a goof. No matter what, we have all heard of it, but even at this late stage, a lot of people seem to be confused as to what it is (and isn’t!), what it can do, and as one college of mine stated: “Why can’t it write songs with actual soul?”.
One thing to clear up immediately is that Chat GPT is not an artificial intelligence (A.I.) well, not what you or I consider to be A.I. It falls under the category of generative A.I., which is basically software trained on how to recognise patterns in language.
Chat GPT cannot solve any mathematical problems. It cannot logic things out or reach conclusions based on information. At best, it can take a guess. It is a language learning model (LLM) which has been trained on the Internet and a variety of texts, so if you ask it a question it will find out what has been said online and create language to fit its model.
Despite all these aspects, what is clear is that Chat GPT is a master of language, or imitating language, so what can we learn from it, and how can we apply that to second-language acquisition (SLA)?
Chat GPT is fascinating when it comes to language as it is in this area that it excels.
The way it interprets language is not through meaning but through pattern recognition. You’ll hear a lot of experts refer to NLP, but it’s not the neurolinguistic programming that you might have come across in ELT. It’s natural language processing, a field of computer science that focuses on how people and computers can communicate with each other.
The process in which Chat GPT learns words mirrors our learning process. Through a process called subword tokenisation, words are broken down into smaller units. So, for example, a word like university is broken down into the subwords un i ver ity and these subwords can form parts of other words such as versatile diversity. It’s similar to the idea of a morpheme but a subword lacks any meaning whatsoever. It’s a pattern of words that tend to cluster or repeat.
What you might have noticed, is how this is similar to how children first approach reading and writing by breaking down words into phonetics and other patterns. What’s more is how this mimics how we as children start to recogise and use lexis over time through the process of exposure, practice, correction, and correct production.
Breaking-down things to their simplest. Chat GPT examines the prompt given and predicts what words, phrases, sentences or paragraphs are typically and most likely associated with the prompt.
How it does this is by comparing the best answer to the prompt to its enormous dataset which has been trained on. So Chat does not really understand the language it presents. Instead, Chat GPT has learnt language through analysing patterns and relationships in the text dataset it has been trained on.
Chat GPT is able to do this because it has been trained to compare its input to language written in millions of websites, books, articles, and other forms of text-based interactions. The size of the dataset that Chat GPT 3.5 has been trained on is truly astounding. 45 terabytes of data which roughly equals 83,000,000 pages of information! If you have a subscription with Open AI you are probably using Chat GPT 4 which has been trained on even more data and is better able to create more nuanced answers to prompts.
Some people have compared this method to the predictive text used on your smartphone, but this is a bit of a misnomer. Your phone’s predictive text is more akin to guessing the next words based on the individual letters it reads rather than understanding how any of those words go together.
At the very last stage, Chat GPT slightly randomises the output so that users do not get the same response every time.
So we’ve looked at the basic steps of how Chat GPT produces language but how does it get to that point? How does it process language to produce meaningful output?
Well, if you are interested in language or SLA, some of this might come as a surprise to you.
Chat GPT, like other LLMs, uses its enormous dataset of authentic language to form a deep-learning neural network. Without getting into the specifics and the scary math it is essentially a multi-layered weight algorithm which is quite similar to how we think the human brain works when learning and processing language. Still with me? No?
Okay, let’s break it down.
Imagine you’re learning a new word, let’s say “peregrination,” which means a long journey. You want to understand it well, so you gather information from various sources: a dictionary, a teacher, and your friend who loves to read.
Here’s how this connects to the algorithm and how the brain processes language:
Information Sources = Neurons: Each source of information (dictionary, teacher, friend) is like a neuron in the brain or a unit in the algorithm.
Source Reliability = Weight: The weight is like how much you trust each source’s explanation. You might trust the teacher more because they’re an expert.
Understanding the Word = Output: Your final understanding of the word “peregrination” is like the output of the algorithm. It’s shaped by explanations from different sources with different levels of trust (weights).
Steps in Learning = Layers: Just as the algorithm has different layers, your learning process has different steps. In this case, it’s the input of information from various sources, processing that information, and then forming your understanding.
Obviously, it’s important to note that while these similarities exist, artificial deep-learning neural networks are vastly simplified models of the human brain’s complexity. The brain’s functioning involves intricate biological processes, including neurotransmitters, plasticity, and distributed processing, that go way beyond the scope of current artificial neural networks.
We’ve established how Chat GPT processes your input in general but let’s look at what is happening behind the scenes with a sample prompt.
So the moment you hit enter to send the prompt Chat GPT starts calculating the likelihood and frequency of what word comes next. It does not look for any existing text that might have a similar phrase. It looks for matches in context and meaning that best fit the input from the user.
After this, it produces a ranked list of words that typically follow together with their probability.
1. Second language acquisition is complex 5.6%
2. Second language acquisition is beneficial 4.2%
3. Second language acquisition is multifaceted 3.7%
4. Second language acquisition is evolving 2.4%
It continues to build the answers by adding words based on the probability of what the next word should be based on its neural network. It also has a built-in stop mechanism that indicates when the sentence or paragraph should end and what format the text should be based on the dataset as well.
1. Second language acquisition is complex 5.6%
1. Second language acquisition is complex, involving 7.1%
2. Second language acquisition is complex for 5.8%
3. Second language acquisition is complex with 4.6%
4. Second language acquisition is complex due 3.6%
1. Second language acquisition is complex, involving a 12.3%
2. Second language acquisition is complex involving cognitive 6.4%
3. Second language acquisition is complex involving differences 4.4%
4. Second language acquisition is complex involving several 2.7%
Now, you might have noticed from using Chat GPT that it does not reproduce the same result every time. This is because randomness has been programmed into the language learning model. So, it will pick less likely options randomly to appear to be more creative. However, this is what can drastically increase the chance of Chat GPT hallucinating.
This is a gross simplification and there are many more stages that Chat GPT was trained and improved upon with human interactions, especially with contextual clues, but its approach to language is essentially what is above.
These are when LLMs createe information that seem real but are not based on actual data or facts. It originates from LLMs prioritising connecting words that probably go together regardless of the fact if they should go together.
If we can teach an A.I. to build up a reserve of language consisting of tens of millions of pages, how could we harness this new teaching method for future language learners? Is there anything to be gleaned from it?
Here are some activities you could try using with your learners that mimic the way an LLM learns language.
Sentence Completion Relay: Similar to how LLMs predict the next word in a sentence which we saw above, students can take turns completing sentences. One student starts a sentence, and the next student has to predict and complete the next part.
Word Association Chain: Just like LLMs recognise connections between words, students can play a word association game. One student says a word, and the next has to say a word related to it. This helps to reinforce vocabulary and word relationships.
Contextual Storytelling: LLMs understand the context to generate relevant text. Students can be given a sentence and asked to build a story around it, ensuring their sentences fit the given context.
Dialogue Building Puzzles: LLMs have learnt contextual clues and dialogue markers by studying human conversations. To simulate a similar learning process give students jumbled dialogue pieces, this could be a Jenga game or even pieces of lego or card, and ask them to arrange them in the correct order to form meaningful conversations.
Language Mutation: LLMs can generate creative text. Give students a sentence and ask them to create alternative sentences while maintaining the original meaning but using different words.
Contextual Song Lyrics: This is similar to the word association chain but with a nice twist. Play a song with missing words, and students need to predict and fill in the blanks based on the context. If necessary you can help learners by showing the music video to provide extra contextual information.
So in the grand scheme of language learning, the techniques that LLMs employ offer exciting prospects for enhancing SLA and ELT.
By embracing the LLMs’ emphasis on exposure, pattern recognition, and contextual understanding, we can design more immersive and effective learning experiences.
Simultaneously, learners can use LLM-inspired strategies to immerse themselves in diverse language resources, decipher patterns, and engage in real-world contexts more, ultimately bridging the gap between artificial intelligence and the art of human language acquisition.
As the technology and teaching landscape evolves, this symbiotic relationship promises an exciting future for language learners and educators alike.
There’s a lot of academic literature out there to check on on this topic so here are some links to articles and material to help you out!
Oba, M. et al. (2023) ‘Second language acquisition of neural language models’, Findings of the Association for Computational Linguistics
Email: info@ciaranlynch.com