Time to read: 15 min

History of LLMs 11 Bright Moments

History of LLMs 11 Bright Moments

Friflex is one of the leaders in developing AI products for sports digitalization - idSport. Our VAR system based on idChess helps the International Chess Federation resolve controversial situations during tournaments.

In addition to idChess, the ecosystem includes idBilliards and idBall. They allow you to recognize and record interesting highlights in billiards and football.

The Friflex machine learning team is closely following the development of artificial intelligence. We see that LLMs have great potential both for the development of our products and for many other tasks. This article brings together 11 bright moments from the history of LLMs. They will help you understand how impressive the path these systems have taken and simply entertain you.

As Pelevin wrote, "essentially, the function of an LLM is autocompletion brought to an unthinkable level of perfection. The LLM does not think. It is trained on a huge corpus of previously created texts "..." and on this basis predicts how a new sequence of words will grow and develop, and how it most likely will not develop... It is similar to forming a young member of society based on daily verbal instructions, slaps on the head, and observing who is given food and who is not."

This definition of a large language model (LLM) is loose but quite accurate. An LLM is a neural network trained on a large amount of text. It can analyze, understand, and generate text in natural language, use extensive databases, and understand context.

How the idea that a machine can be taught to understand and generate text as if it were written by a person developed, we will tell you further.

1957: Frank Rosenblatt creates a perceptron

The first attempt to teach a machine to learn on its own.

In the middle of the last century, the American scientist Rosenblatt was fascinated by the study of the human brain. He dreamed of creating an artificial model that could imitate its ability to learn and recognize patterns.

The first page of Rosenblatt's article "Designing an Intelligent Automaton" from the Cornell Aeronautical Laboratory Research Trends journal, summer 1958

"Stories of machines with human qualities have long captivated science fiction. But we are about to witness the birth of such a machine - a machine capable of perceiving, recognizing and identifying its surroundings without human training or control," Rosenblatt wrote in 1958.

The scientist presented his ideas in the work "The Perceptron Principle". He called a perceptron a device that modeled the process of human perception. It was a simple model of an artificial neural network. In 1960, Rosenblatt showed how it works on the first-ever neurocomputer, the Mark-1.

This device had a peculiar eye - a matrix of photosensitive elements. The Mark-1 could recognize some English letters and geometric shapes that Rosenblatt showed on cards or on paper. In addition, the computer could change the weight coefficients of connections to improve recognition after feedback about the result.

Of course, the capabilities of the Mark-1 were very modest. For example, if the letters were partially closed or their size differed from the samples on which the computer was trained, the machine did not recognize them. But all the same, it was a significant achievement for that time. It laid the foundation for further research in the field of neural networks and artificial intelligence.

1965: Joseph Weizenbaum creates ELIZA

The first natural language processing system.

While Rosenblatt was developing the perceptron, another American scientist, Joseph Weizenbaum, was working on the ELIZA program. ELIZA was a simple chatbot. It analyzed the sentences that the user entered and found keywords.

Then ELIZA formed answers using templates. The program worked on the basis of pre-programmed scripts that defined the rules for processing phrases and reactions to them. The most famous script was DOCTOR. It parodied the work of a psychotherapist in the style of Carl Rogers.

Often the program reformulated the interlocutor's phrases in the form of questions. For example, if the user entered: "I am sad", ELIZA could answer: "Why do you think you are sad?".

Weizenbaum himself wanted to show on ELIZA that communication between people and machines is rather superficial. But many people really felt a connection with the program, although it was very simple, and perceived it as an intelligent interlocutor.

Questions of ethics and technology so captivated the researcher that he wrote a whole book "The Power of the Computer and Human Reason: From Judgment to Computation" - about the fact that the computer should not replace human judgment and intelligence.

ELIZA was not even a real artificial intelligence. It did not understand the meaning of questions and could not learn from its experience. But the program showed that the computer could participate in meaningful dialogues in natural language.

1970: Marvin Minsky and Seymour Papert publish the book "Perceptrons"

The "winter of artificial intelligence" begins.

MIT professor Marvin Minsky was skeptical of Rosenblatt's ideas. The scientists periodically publicly and hotly debated the viability of the perceptron at conferences. Rosenblatt believed that he could teach computers to understand language, while Minsky argued that the perceptron had too simple functions for that.

Those were not just disputes. Minsky, together with Papert, investigated the mathematical properties of the perceptron and showed that it was not capable of solving a whole range of tasks related to invariant representation. For example, reading letters or numbers located differently on the page.

Their book "Perceptrons" was published in 1970. After its publication, interest in research on neural networks fell so much that the seventies began to be called the "winter of artificial intelligence." Not only scientific interest shifted, but also subsidies from American government organizations - to the delight of the followers of the symbolic approach.

1986: David Rumelhart and Geoffrey Hinton propose the backpropagation error method

Interest in neural networks is revived.

Criticism of the perceptron led not only to the "winter of artificial intelligence": researchers began to look for more powerful models. Rosenblatt's single-layer perceptron was replaced by a multilayer one.

In the article "Learning representations by back-propagating errors," Rumelhart and Hinton showed that the multilayer perceptron copes with tasks beyond the power of its single-layer predecessor. For example, with XOR.

XOR is a logical operation that gives a value of true (or 1) if the input values ​​differ, and false (or 0) if both input values ​​are the same.

The truth table for XOR looks like this:

Input: 0, 0 → Output: 0 Input: 0, 1 → Output: 1 Input: 1, 0 → Output: 1 Input: 1, 1 → Output: 0

A single-layer perceptron could only solve linear tasks, and in XOR, the data cannot be divided by a linear feature. The multilayer one solved XOR and other similar tasks. Including, thanks to the backpropagation method, which was proposed by Rumelhart and Hinton.

The method consisted in iteratively adjusting the weights of the neural connections of the network and moving in this direction until the error became small enough.

The mathematical apparatus of the backpropagation error method is quite simple. But it allowed neural networks to learn on data that is significantly more complex and diverse than before.

1997: IBM Deep Blue defeats world chess champion Garry Kasparov

The machine was able to surpass the human in a complex task.

Chess has long remained an area where human intelligence reigned. Until, in 1997, the supercomputer Deep Blue did not beat Garry Kasparov.

This was the second match between the machine and the reigning world champion. In the first Deep Blue lost.

The historic match consisted of six games and took place in New York. The first game ended with Kasparov's victory. In the second, the chess player surrendered, and the third, fourth, and fifth ended in a draw. The sixth game lasted only 19 moves: Kasparov played the opening poorly, got a bad position, and quickly lost.

Deep Blue's victory showed that machines can surpass humans in intellectually complex tasks, and also demonstrated the possibilities of machine learning and big data analysis. This event became a source of inspiration for further research in artificial intelligence.

By the way, Kasparov doubted that the match was fair: the program played with interruptions and chose unusual moves. But IBM's fraud was not proven.

2007: IBM introduces the Watson system

It processed natural language and answered questions in a quiz format.

Watson was not yet an LLM agent in the modern sense. But its architecture already included various methods of natural language analysis and processing. For example, the computer divided the text into individual words and phrases and then converted them into tokens.

Watson was also able to build a syntactic tree of a sentence, determine the meaning and context of words and phrases, and establish what part of speech each word belonged to.

In 2011, the computer went on the Jeopardy quiz show and beat the two best players in the show's history - Ken Jennings and Rust Cohle. After that, Watson found application in other industries, from finance to medicine.

2017: Google engineers described the Transformer architecture

Transformer radically changed the approach to natural language processing and NLP.

Ashish Vaswani and his colleagues published the cult article "Attention Is All You Need" that described the architecture of a neural network called Transformer.

Previously, recurrent neural networks were usually used to convert sequences. Transformer was based only on attention mechanisms without recursion and convolutions.

There is a lot to be said about the significance of Transformer in the history of LLM agents. But it will be enough to say that today it is the dominant neural network architecture for NLP.

2018: OpenAI introduces GPT-1

The Transformer architecture has proven its viability.

GPT-1 is the first of its kind generative pre-trained Transformer model. How it worked, Alex Radford and his colleagues from Open AI described in the article "Improving Language Understanding by Generative Pre-Training".

In it, the researchers stated that:

It is easier to train models on unlabeled data - without labels, tags, and other additional information. Such texts are much more accessible and there are more of them than annotated ones intended for specific tasks.

If you pre-train a model on a diverse corpus of unlabeled text, it will be able to significantly better understand natural language, answer questions, assess semantic similarity, and perform other similar tasks.

The Transformer architecture helps the model better cope with long-term dependencies in the text.

The universal model outperforms models with an architecture adapted for each task, and trained using a discriminative method (this is an approach that focuses on the differences between data classes).

Interestingly, the method did not exclude the discriminative method completely but suggested using it after pre-training.

The GPT-1 model was a significant breakthrough in the field of natural language processing. But it still had quite a few limitations. For example, on long stretches of text, GPT-1 could lose its original context. In addition, from time to time it inserted incorrect data and facts into the text.

2019: VL-BERT appears

The model has learned to process information from both text and images.

VL-BERT is one of the first multimodal LLM models. It is interesting in that it is pre-trained on large-scale visual and linguistic datasets. This allows it to synthesize information from the area of ​​interest in the image and related text descriptions to form a justification for choosing an answer. That is, it not only recognizes what is in the picture but also connects it to the text.

VL-BERT can also:

  1. Describe image details. For example, you can ask "What color is the car?" or "What is the person in the photo doing?".

  2. Localize an object in the image. For example, you can ask it to find "a dog playing" and the model will find it.

VL-BERT has made a significant contribution to multimodal recognition and processing. By the way, the model is open source. You can see it on GitHub.

2020: OpenAI introduces GPT-3

The model learned to generate text like a human.

GPT-3 showed that increasing the size of the model can significantly improve the quality of the text it generates. The model was distinguished by its versatility. It could both perform machine translation and write code. Therefore, the scope of its application was much wider than that of its predecessors.

In addition, GPT-3 has a fairly simple API. This makes it accessible to a wide range of users.

The GPT-3 model evolved, resulting in ChatGPT. This is a version of the model trained not only on texts from the Internet but also on dialog data. If, before the advent of ChatGPT, language models were still largely the subject of theoretical research, now they have begun to be widely used in practice.

2021: Google introduced LaMDA

The model has learned to simulate human conversation.

So natural that Google AI engineer Blake Lemoine called LaMDA sentient. Lemoine claimed that the model had deep conversations with him about life, death, and the meaning of existence, and even expressed its own feelings and emotions.

However, other Google AI employees disagreed with Lemoine's claims. They stated that LaMDA is just a complex machine learning model that is able to generate human-like text in response to a wide range of prompts and questions. They also argued that Lemoine's claims were based on his personal misinterpretations of the model's responses.

Despite the controversy, LaMDA is a powerful tool that has the potential to revolutionize the way we interact with computers. It can be used to create chatbots that are more engaging and realistic than ever before, and it can also be used to develop new educational tools and games.

As LLM technology continues to evolve, it is likely that we will see even more impressive and innovative applications for these models. It is an exciting time to be working in the field of artificial intelligence.

The next few years have seen an explosion of research and development in the field of LLMs. The scale and capabilities of models have expanded, as have their applications. There have been so many interesting releases that it's hard to know which ones will go down in history.

For example, on May 13th, OpenAI introduced ChatGPT-4 Omni. The word Omni, which translates to "all," "universal," or "comprehensive," indicates the model's multimodality. GPT-4o perceives voice, text, and visual cues.

The model can respond to a voice prompt in just 232 milliseconds and on average responds in 320 milliseconds. This can be compared to the reaction time of a person in a conversation. LLM agents from text-based models have evolved into supermodels. They work with text, pictures, and sound. Their algorithms recognize and interpret visual data, understand and synthesize audio, and integrate text with images and audio.

Friflex not only develops its own AI products but also helps businesses implement ML models into their processes. If you want to learn more about this direction, please contact us.

Share:

Logo
panda
Any ideas? Let`s chat! Start a project