British developers have proposed a new way to teach artificial intelligence to conversation. To do this, they asked people to talk to themselves. The resulting dialogues, as reported in the preprint, published on arXiv, are much more effective for training than, for example, the body of subtitles for films. From the collected data, researchers also collected a body of 3.6 million words, including dialogues on 23 different topics.
Voice assistants are getting smarter every year, but still have an important drawback, namely – they do not know how to support the conversation qualitatively. This restriction affects not only the work of the services supporting them, but also that without a naturally built dialogue, artificial intelligence can never pass the Turing test. Of course, you can always choose a simplified version – for example, abstract phrases like “I do not know” – but it will be difficult to call a qualitative dialogue.
The main reason for this shortcoming is the training sample. To effectively maintain the conversation, the computer needs to learn how to conduct a dialogue on millions of real human conversations, but it is not easy to assemble a sufficient body. In January, developers from Facebook gathered abody of 160,000 snippets of dialogue: for this they asked volunteers to communicate with each other on behalf of fictitious persons. The results of the chat-bot trained on the collected data turned out to be quite natural.
Another way to create a corpus of dialogues was proposed by researchers from Edinburgh University under the leadership of Joachim Fainberg (Joachim Fainberg). To do this, they hired people from the Amazon’s Mechanical Turk crowdsourcing platform and asked them to talk to themselves about the topic: for example, about movies, music or literature. The entire dialogue was to consist of a maximum of ten replicas, and one replica was limited to one or two sentences.
Example dialogue (topic: Disney movies)
1: What’s your favorite movie?
2: I think that “Beauty and the Beast”.
1: Is that new?
2: No, I’m talking about the cartoon. It’s just so magical
1: What’s your favorite movie at all?
2: I think that “Sounds of Music”.
1: Seriously? Except as in cartoons and everything else musicals do not impress me very much.
2: I love musicals. I really liked The Phantom of the Opera.
It turned out that creating a shell based on dialogues with oneself is an effective method in terms of the resources used. First, you need more people in order to assemble the body from real dialogues. Secondly, participants often have to wait for the reply of the interlocutor, which takes a lot of time. Use to create a shell of dialogues with oneself allowed to reduce the total time of creation of one passage from 14.9 minutes to 6.5 minutes.
A total of 2,717 people took part in the creation of the hull, each of which, on average, created nine dialogues. The corpus contains 141945 replicas and more than three million words, and the set 23 topics relate to culture and sports and include baseball, football, Star Wars and superhero films.
After assembling the building, the scientists decided to test it in action, teaching him a chat-bot and comparing his work with the same chat-bot, trained on the body of OpenSubtitles . Dialogues received after training at the new building, as noted by the authors, turned out to be more natural:
Example dialogue (SD – Dialog box, OS – OpenSubtitles)
What’s your favorite Harry Potter movie?
OS: Not bad, Goyle!
SD: I like everything!
You can download the corpus in the repository of researchers at GitHub.
Dialogues of people can be used not only for teaching artificial intelligence to speak. Recently, American researchers from MIT have taught the neural network to diagnose depression in a patient’s speech.