"I think people believed that both language and meaningful text were their domain. With difficult things like folding proteins, or playing complicated games like chess, it is not surprising an AI can do that," says Michal Valko, Slovak machine learning scientist at DeepMind in Paris. The company is part of Google and specialises in the research of artificial intelligence.
This article is supported by the ESET Foundation, whose annual ESET Science Award recognises exceptional scientists.
"However, with chatbots, it turns out that it is not as difficult as we thought," he says about why people were so taken aback with ChatGPT's capabilities.
Michal Valko is also an envoy of the Eset Science Award. This interview was carried out on the occasion of the announcement of this year's laureates. He speaks about the chatbot's capabilities, whether there is anything resembling an consciousness, if we should fear robot uprising, as well as how cars are trained.
When ChatGPT appeared almost a year ago, people were blown away by what it could do. At the same time, people began to think an AI uprising in the vein of the Terminator movie series is near. Is it?
Certainly not, although I cannot predict the future. We, the scientists who work with artificial intelligence were not at all surprised by what the chatbot could do. We had had chatbots like that for a very long time. They are based on the technology called transformers that has been around since 2017. It was invented by our team at Google.
We didn't make them available because we thought they weren't good enough as they could make things up. It's called hallucinating. However, Open AI put ChatGPT out there and we found that people don't really mind the issue that much. However, we were surprised by their appetite for the technology. Of course, other companies also wanted to show that they have something like that, so there are more of them now.

Then what the chatbot is capable of?
ChatGPT has no consciousness whatsoever. Under the hood, there is only matrix multiplication (it involves two grids of numbers multiplied together and forms the basis of many computing tasks - Ed. note), so the whole transformer is basically simple, but thanks to that, it can be trained on a lot of data. It's a process at the end of which is a distribution of 32,000 words; it starts with picking one word with some probability, then another, and so on to form a sentence.
Each sentence has a certain probability, and learning involves choosing the sentences we want to see, the ones with a higher probability. There is no consciousness, no intention, no opinion, nothing like that. It cannot be compared to the human brain or Skynet (the artificial intelligence from the Terminator franchise that rebelled against humanity - Ed. note). It just learn the patterns of how words follow each other and tries to replicate it. This is what chatbots know now.
Let's say I tell you that you are just placating me and in fact are paid by "Big Tech AI business". What would you say to this?
That everything I say can be verified. These things are public, the source codes are out there, they can be downloaded and trained. However, a lot of computing power is needed, and a lot of data. An ordinary person will not train a chatbot on their laptop, but they can use it at the very least. Chatbot development costs tens of millions of euros, half of which goes to computing power and the other to salaries.
To stay up to date with what scientists in Slovakia or Slovak scientists around the world are doing, subscribe to the Slovak Science newsletter, which will be sent to readers free of charge four times a year.
Why was it that people were so amazed by the technology to begin with? You mentioned that you had these chatbots internally for years, and AI has been used for a long time, for example, in photo editing apps on phones.
It is an ordinary thing. Eliza is a chatbot that is over 50 years old. However, in the past these chatbots were not based on neural networks, but on other models. Now, they learn how likely it is for one word to follow another. More than a year ago, our company launched AlphaFold, a protein folding AI program.
In my opinion, this is a much greater contribution to science, since it was necessary to come up with a lot of innovations. It is something used by millions of biologists and has advanced research on malaria, bacteria and more. However, only a specialist can appreciate this, as opposed to anyone's ability to appreciate a chatbot that makes a text. I think people believed that both language and meaningful text were their domain.
With difficult things like folding proteins, or playing complicated games like chess or go, it is not surprising an AI can do that, even though few understand the progress that was necessary to achieve it in the first place. However, with chatbots, it turns out that it is not as difficult as we thought.
When using ChatGPT, I noticed that it presented me with facts it could not prove, provide source. When I tried, I did not find anything related to the "facts" it provided. Isn't it perhaps more dangerous than the idea of a robot rebellion?
In my opinion, yes. I actually work on this, to eliminate hallucinations. Since a chatbot doesn't think, it just picks one word after another out of many probabilities, so there is no verification, no reflection, no reasoning, no memory. Therefore, it is actually not surprising that something that is not true comes up eventually, because this concept does not exist here. As I've said, we are trying to find out how to complete a sentence and use the Internet as a source.
Not only there untrue things, but also racist, sexist, toxic things on the Internet. A chatbots learns from those too. We are charmed by a chatbot's ability to complete sentences that make sense and think it probably means something and thus it is reliable. Well, it doesn't and it's not. That's dangerous and we know there some basic things a chatbot cannot handle.

What are those?
One such thing is called a reverse curse. We ask who is Tom Cruise's mother, the chatbot say Mary Lee Pfeiffer. We then ask what Mary Lee Pheiffer's son's name is, and the chatbot replies that she has no children. A living person wouldn't do that, it doesn't take any genius. I can give many examples where the wall breaks on something so trivial. Therefore, research into how to eliminate or reduce hallucinations is important.
How can that be achieved?
We try to find data that supports the inference of a chatbot. We can try to find out that if a chatbot is hallucinating and training on certain data, we can modify the training method. Then we can use human feedback to say that we do not prefer such answers, thus teaching the chatbot what to choose and at the same time reducing the chance of hallucination. On the other hand, people are also capable of hallucinating, when we ask something and the person doesn't know the answer, they can make it up.
Can an AI be trained to say that it doesn't know an answer?
It can be done. It's actually a standard approach in machine learning since before chatbots came along. It is called learning with abstention. It would be possible to use things that do not have an answer as an input data and as an output use "I don't know" for example. Or one part of the training would be that we let the chatbot give two random answers, because it always gives one.
Does it have to?
We need to determine what we want as an output. We can teach the chatbot that between two answers, a human will prefer the answer "I don't know", and thus give the chatbot a better chance to say it. The thing is, we expect a lot of things from a chatbot. It is very difficult to even say what a good chatbot actually means. Do we want it to not imagine things? Not be racist? Toxic? To verify sources? It can be a lot, making it difficult to optimise. Sometimes such requirements go against each other. We can get better in one metric and get worse in another. My research is also about how to improve one metric while not worsening the other. Sometimes we also have to let the user choose what kind of chatbot they want.
Although ChatGPT had some restrictions built-in, for example not providing instructions on how to build a bomb, people eventually found a way around them anyway. How do you deal with such misuse of technology?
We take this very seriously. The AlphaFold protein folding system went through several levels of scrutiny. We have an internal board where we ask what the positives and negatives are. We also asked scientists what they thought about it. In the end, we decided that the positives outweighed the negatives. But that doesn't mean someone else wouldn't do the opposite.
We co-founded Partnership on AI, a coalition of all the big players in the field including governments, non-profits, ethicists, philosophers try to discuss these things. We also have a code of what we would never do. I can mention we have an absolute red line on anything that could even remotely contribute to the development of weapons. Google also has its own code and we follow that too. However, no institution has the right answers and you have to go on a case by case basis.
Another question is that while it may sometimes be unethical to release something, it can also be unethical not to release something. AlphaFold could help the development of an important drug and it would therefore be unethical to prevent progress in medicine. The difference is also that medicine is a completely different field than designing random images. There are already regulations in some areas. AI is not the first area that needs them, and we can take inspiration from those areas that have had regulations for a long time.

Are you suggesting that there will not be a day when we have to quote Jeff Goldblum who said in Jurassic Park that "your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should"?
Yes. Regulation plays a big role. As scientists, we ask where ours and our company's boundaries are. At the same time, the whole society must be involved. We need as many opinions as possible so that it is not concentrated on just one country. This is also why my colleagues and I organised the Eastern European Machine Learning Summer School in Košice and attracted top scientists to Central Europe. We have a brain drain and countries like Slovakia, Ukraine, Romania do not contribute much in this area, so we try to help and encourage people to do this. We need input, ideas from different fields, to be exposed to critical opinions.
Have you, as a company, ever developed a system that you changed your mind about and did not made it available?
Even when we start working on something, it must go through a review. We have to choose our partners, we cannot do with just anyone. They have to be vetted institutions and we have ways to see if the risk is proportionate. I don't know of anything that it was about to fall, and then it turned around. For example, chatbots were around before, but companies didn't release them because of hallucinations.
A month before ChatGPT launched, Meta launched its own chatbot called Galactica that wrote a scientific article that was rubbish. Meta took a risk but they had to take it down almost immediately because the public was angry. Open AI, as a lesser-known player in the area, was therefore able to take the risk. It would have been much more difficult for Google too, as people's expectations are very high. When they search for something, they expect results that are real. The same goes for their maps.
We have touched on your work a few times. What you are doing is called reinforcement learning. What does it mean?
It is an old subset of artificial intelligence research. The name is borrowed from neuroscience, basically the first form was Pavlov's dog experiments, that is, learning in the form of reward and punishment. It is a method that is used when we cannot describe precisely how a system, which we also call agent, should behave.
It's easy to say what's in a picture, but it's hard, for example, to say how to drive a car properly and teach the system what to do. For example, so that when its sensors see something, it slows down and turns. And there are so many possibilities. But at the same time, we want it to follow the traffic rules, make people feel comfortable and so on. The principle of reinforcement learning is to let the system work and then tell it what was good and what was bad, because it's easier that way.
It is used when there is a sequence of steps. When we want to get somewhere by car, we only find out after arriving at the destination whether the sequence of steps was good. Because I can say that you drive well, but you arrived entirely elsewhere than you were supposed to.
What does it look like with chatbots?
When it comes to chatbots, it is used in the last phase of training. Currently, there are three phases. In pre-training, they learn to complete sentences based on what they read on the Internet. That's how they learn what words come after previous words. At this stage, we cannot talk to them yet. In the second phase, we teach them on dialogues and they answer questions. Finally, we fine-tune the process with reinforcement learning so that they don't make things up, don't behave awkwardly, don't pretend to be a doctor or financial advisor.
That is to not do what may seem problematic to us, but at the same time is difficult to define. So we let the chatbot offer two random answers and we choose the better one. We're fine-tuning it to maximise this otherwise very hard-to-define feedback. This is the biggest use of reinforcement learning in 2023.
What direction is the research headed now?
Transformers are the predominant trend. At the same time, not all things in AI are transformers, for example, generative AI that makes videos or pictures. However, transformers have one drawback. Training is energy intensive. Research is being done to reduce this footprint, because training is expensive and with transformers it must be done many times. This is one of the most important things in the area, how to make chatbots cheaper and also improve their capabilities. People are also trying to make them multimodal, so that they provide a combination of text, audio, video. There are many open questions and we are only at the beginning.
This article is supported by the ESET Foundation, whose annual ESET Science Award recognises exceptional scientists.