• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

‘Bots Are Simply Imitators, not Artists’: How to Distinguish Artificial Intellect from a Real Author

‘Bots Are Simply Imitators, not Artists’: How to Distinguish Artificial Intellect from a Real Author

© iStock

Today, text bots like ChatGPT are doing many tasks that were originally human work. In our place, they can rewrite ‘War and Peace’ in a Shakespearean style, write a thesis on Ancient Mesopotamia, or create a Valentine’s Day card. But is there any way to identify an AI-generated text and distinguish it from works done by a human being? Can we catch out a robot? The Deputy Head of the HSE School of Data Analysis and Artificial Intelligence, Professor of the HSE Faculty of Computer Science Vasilii Gromov explained the answer in his lecture ‘Catch out a Bot, or the Large-Scale Structure of Natural Intelligence’ for Znanie intellectual society.

‘Why are modern texts created and who writes them?’ asked Vasilii Gromov. His generation and the generation of lecture listeners grew up on works written by people for people: authors of such texts put a certain meaning into their works, had a certain goal, whether the book was ‘Sleeping Beauty,’ ‘War and Peace,’ or a textbook of mathematical analysis, the professor notes. However, nowadays, children from a very early age are surrounded by texts written by an unknown author with an unclear purpose for an undefined audience. Vasilii Gromov and his colleagues wondered whether such a child would grow up the same way the previous generations have done.

The ongoing change is neither good nor bad, because the world is transforming. Humankind is now experiencing the process of ‘co-evolution of artificial intelligence and humans.’ Along with its rapid development, AI is adapting to humans, but humans also are beginning to adapt to artificial intelligence as well. To secure our future, or at least for ‘basic information hygiene,’ we need to learn to distinguish texts generated by bots (artificial intelligence systems that generate texts in natural languages like Russian, Chinese, etc) from those written by people.

Using a number of existing generated texts, it would not be difficult to identify whether a new text was written by a specific bot or a human: we simply need to load a large number of similarly generated texts into the neural network—and there you go, mission accomplished. However, after this, no-one would continue using that particular bot, and it would simply be replaced by another artificial intelligence. Therefore, scientists need to develop a mechanism capable of distinguishing any bot from any human. To do this, we need to look at the structure of language itself, which brings us to research, explaining natural languages from a mathematical point of view. Now, let’s take a look at the necessary steps.

The scientific field of natural language processing works, in particular, with the representation of words and sequences of words (n-grams, where n is the number of words) in the form of vectors (several elements of a certain number in a row), which creates a certain vector space.

Working with the representation of individual words reveals that the vocabulary of bots is no different from the vocabulary of an ordinary person. However, as soon as it comes to a sequence of two or three words, it turns out that the sequence generated by bots is significantly more predictable and much poorer in linguistic terms than the one that even the most poorly educated person can create (for example, a bot is more likely to repeat patterns). The difference between the n-gram sequence of bots and people is statistically significant even for large bots (ChatGPT), and this is what helps catch them.

Further study of natural language from a mathematical point of view brings scholars to some judgments on the location of such word vectors in space. There are regions of vector space (especially when it comes to the sequences of words) that only bots visit, and others that only people visit. Most (90–95%) are used by both, but there are separate bot areas—which is another way to catch them out.

If we cluster (a mathematical operation when sets of similar elements can be combined into one group—a cluster) a sequence of bots, these sequences turn out to be more rigid, compact, and without any discrepancies. When a verbal sequence of people of different genders and ages, with different education and backgrounds is clustered, the result is more blurry, indistinct clusters. Humans think significantly less clearly than bots, and this is another way to catch them.

If we represent each word or each n-gram as a vector, then their entire collection can be represented as a geometric object or a certain surface in a multidimensional space. Then, for example, if we take all possible word sequences in Russian, we may find that they do not fill the entire semantic space, but only part of it. Scientists can study and measure this sequence as a surface, even compare it with other surfaces (for example, with the surface of the English language). So, every surface in space has a dimension, ie, the number of independent parameters necessary to describe this object (for points on a sphere, for example, these are two values—longitude and latitude).

Studying the dimension of natural language, Vasilii Gromov expected to find an infinite value, but in the end, analysts came to the conclusion that language has a 9–10-digit dimension, and this figure varies slightly from language to language, but what is certain: human language lies in larger space dimensions than the bot's language.

Finally, the results of a recent 2023 study showed that this surface has ‘holes’ in it, like Swiss cheese. The holes are those areas of semantic space that our language has not yet reached. Although at the moment analysts cannot clearly indicate what is hidden behind them, they can detect them. Different languages have different holes, also referred to as ‘blind spots.’ When catching bots, it is important to remember that people are drawn to the boundaries of such holes, because they use language to create new meanings and ideas. Meanwhile, bots, like learned programs, move away from these holes, which makes the task of catching them easier for now. Surprisingly, it is humour that most often appears at the boundaries of such holes.

‘Bots are simply imitators, not artists. Technology does not stand still, so we must try to solve this “bot-catching” problem and understand what a language is from a mathematical point of view,’ summarised Vasilii Gromov.

See also:

White Papers of AI Conformity Assessment Published on HSE University Website

The Russian Technical Committee for the Standardization of ‘Artificial intelligence’ (TC164), together with the Chamber for Indo Russo Technology Collaboration and the RUSSOFT Non-profit Partnership of Software Developers, has published new White Papers related to Artificial Intelligence Conformity Assessment. It reflects the approaches to the standardization and ethical regulation of AI technologies in two pilot industries — healthcare and agriculture.

Space for Collaboration in Artificial Intelligence

The HSE Laboratory of Artificial Intelligence for Cognitive Sciences (AICS) has launched regular seminars, offering students and scientists from various universities and research centres the opportunity to share their latest research and discuss the most recent developments in artificial intelligence in a friendly and constructive atmosphere. The first seminar was held on May 15.

Scientists Propose Star-Shaped Diffusion Model

Scientists at the AI Research Centre and the Faculty of Computer Science at HSE University, the Artificial Intelligence Research Institute (AIRI), and Sber AI have come up with novel architecture for diffusion neural networks, making it possible to configure eight distinct types of noise distribution. Instead of the classical Markov chain model with Gaussian distribution, the scientists propose a star-shaped model where the distribution type can be selected and preset. This can aid in solving problems across various geometric modalities. The results were presented at the NeurIPS 2023 conference.

HSE Scientists Leverage AI to Accelerate Advancement of 5G and 6G Wireless Communication Systems

The HSE Artificial Intelligence Centre has developed software for modelling radio channels in 5G and 6G wireless networks, based on ray tracing and machine learning techniques. Their software solutions enable modelling radio wave propagation between transmitters and receivers and can convert ray tracing data into a frame sequence format, configure and train neural networks based on this data, and subsequently save the trained models. 

‘Like Electricity, AI Can Bring Incredible Benefits’

Developments in the field of artificial intelligence are gradually taking over the world. AI has the potential to bring incredible benefits to the global economy and our quality of life, but it also creates new challenges. Panos Pardalos, Professor at the University of Florida, Academic Supervisor of the Laboratory of Algorithms and Technologies for Networks Analysis (Nizhny Novgorod), covered these issues, along with other related topics, in his recent report.

‘You Need to Know a Lot of Ideas and Algorithms, Come Up with Something Unconventional’

A student of the HSE Faculty of Computer Science, Andrey Kuznetsov, has become the winner of the 2024 Data Fusion Contest. He took first place in solving geoanalytics tasks, and also won the special ‘Companion’ category. The competition took place as part of the 2024 Data Fusion conference on big data and AI technologies. Researchers from HSE University presented the results of their work and demonstrated applied developments at the conference.

‘We Need to Learn to Communicate with Artificial Intelligence Services’

An online course 'What is Generative AI?’ has been launched on the Open Education platform, which will help students learn more about how to properly communicate with neural networks so that they can perform tasks better. Daria Kasyanenko, an expert at the Continuing Education Centre and senior lecturer at the Big Data and Information Retrieval School at the Faculty of Computer Science, spoke about how generative AI works and how to create content with its help.

Artificial Intelligence Tested by Kant Philosophy

The Baltic Federal University (Kaliningrad) recently hosted an International Congress entitled ‘The World Concept of Philosophy’ in honour of the 300th anniversary of the birth of the philosopher and thinker Immanuel Kant. The event brought together about 500 scientists and experts from 23 countries. HSE Rector Nikita Anisimov took part in the opening plenary session of the congress titled ‘Critique of Artificial Intelligence: Being and Cognition in the Context of Artificial Intelligence Development.’

HSE University to Reward Students Who Write Their Thesis Using AI

HSE University has launched a competition for solutions using artificial intelligence technology in theses work. The goal of the competition is to evaluate how students use tools based on generative models in their 2024 graduation theses (GT).

Production of the Future: AI Research Centre Presents Its Developments in Manual Operations Control Systems

Researchers from the HSE AI Research Centre have built a system for the automated control of manual operations, which finds application in industrial production. The system facilitates the process of monitoring objects and actions, as well as controlling the quality of their execution.