In recent years, robotic has advanced in leaps and bounds. Nowadays, the term advanced robotics is used to refer to technologies that interact with the real-world environment to solve real-world problems. Some of the applications of advanced robots are in healthcare and medicine.
In many cases, these applications require a virtual assistant to interact with the robot. A virtual assistant is a software agent that can perform tasks or services for an individual based on commands or questions. In other words, it simulates a conversation to deliver voice or text-based information to a user. Then, users can interact with an artificial agent by just speaking.
The following project described below consists of a conversational agent that is activated by a customised keyword. Once activated, the user can have a conversation with the system, ask it a specific question, etc. It is important to note that this project will be part of a robot that will operate inside a hospital with the idea of assisting patients.
To perform this, we rely on the Google Home Mini, a smart speaker from Google. And why we use this smart speaker? The reason is that, even if we train our own conversational engine for a long time, it will not come to resemble the immensely powerful Google engine.
It should be pointed out that it has been programmed to work not only with the Google speakers but also with Alexa. However, this article only refers to the Google speaker (Google Home Mini), as it is the one that has been used.
Thus, what can be achieved with this project?
First, the system can be woken up by saying a customized keyword. Internally, the system wakes up the Google speaker and the conversation would take place with this. Thanks to programmed conversations in Dialogflow, use cases can be created in which the speaker responds as desired to determine questions. For example, we can say: “Hello Victory, where is patient Juan?” and the answer could be: «Patient Juan is in room 25».
This work covers another very important objective in robotic systems. In most of these systems, it is the user who starts the conversation, while in this case the robot will be able to initiate the conversation on its own when it wants to say hello or detect a certain event, for example.
What is ROS?
The Robot Operating System (ROS) is an open-source framework that help developers build and reuse code between robotics applications.
The operation of ROS is based on a set of nodes that can communicate which each other by publishing and subscribing to a topic thanks to the ROS Master.
These nodes can be programmed in C++ or python.
There are two ways for nodes to communicate. The first is through messages, where the publisher sends a message (from ROS or created by the developer) and the subscriber receives it.
On the other hand, the second way is the services. These allow nodes to send a request and receive a response.
What are the component used to carry through this project?
The hardware components used are:
Raspberry Pi 4
The Raspberry Pi is a low cost, credit-card sized computer that can plugs into a computer monitor or TV and uses a standard keyboard and mouse. It is capable of doing everything you’d expected a desktop computer to do, from browsing the internet and playing high-definition video, to making spreadsheets, word processing and playing games. The four version include a high-performance 64-bit quad-core processor.
Moreover, the Raspberry Pi has the ability to interact with the outside world via the GPIO pin (General Purpose Input / Output).
In this work, the Linux operating system runs on this card, specifically, the Debian Buster 10 release.
Matrix Voice is a development board for building sound driven behaviour and interfaces. It was built with a mission to give every maker and developer a complete, affordable, and user-friendly tool for simple to complex Internet of Things (IoT) voice app creating.
This board has integrated an 8-microphone array, 18 RGB LEDS, an ESP32 and an FPGA. In addition, this component also has GPIO pins. Matrix Voice is connected to the Raspberry GPIO pins, as shown in the picture in the left.
In terms of software, these are some of the functions of the project:
Detect custom Wake Word
For detect the custom Wake Word, we use Rhasspy; an open source, fully offline set of voice assistant services for many human languages.
Out of all the Rhasspy tools available to detect a keyword, Pocketsphinx has been chosen because of its ease of use.
Rhasspy comes with a snazzy web interface that lets us configure, program, and test our voice assistant remotely from our web browser. All the web UI’s functionally is exposed in a comprehensive HTTP API.
The following image shows some of these functions:
In brief, we have a ROS node that publishes a message each time the wakeword is detected.
When the system is waiting for the wakeword, a sound is emitted on the other hand, which “blocks” the Google Home Mini. This is important as it allows the Google speaker not to wake up is someone says “ok google” instead of our keyword.
And how do we play that noise? The answer is through two small speakers connected to the Raspberry. As soon as the keyword is received, the sound stops.
Wake up Google Home Mini
When the keyword is received, the same speakers that used to emit noise now pronounce the keyword from the Google Mini. From this moment on, the conversation takes place with the Google engine.
Start the conversation
As mentioned above, the system can start the conversation. To do this, the speaker is made to play the Google keyword. Then, as we will see below, Node-Red picks up this empty request, processes it by adding the text to be spoken and sends it back.
Thus, the user only sees how the robot has been able to start a conversation on its own.
A state machine has been created using a tool provided by Qt, a crossplatform application development framework, in C++ language.
The transitions of the created states are realised by Qt signals, i.e., when a certain event occurs, a corresponding signal is emitted that causes a change of state.
The most prominent states are:
- Noise-emitting state.
- State that creates a red Matrix light show.
- State waiting for the wakeword.
- State that illuminates the matrix in green to symbolise that conversation is possible.
- State that wakes up the Google Home Mini.
- State that waits for the end of the conversation to return to the beginning.
The last state has some complexity, as we have had to adjust the timing of the conversation depending on the size of the sentence you want to ask or whether the conversation has not ended, and it is still ongoing.
Node-RED is an Open-Source flow-based tool for connecting hardware devices and online services. Programming is done graphically by connecting predefined blocks, called nodes. The set of this nodes, usually divided into input nodes, processing nodes and output nodes, form what we know as flow.
The following image shows a small part of the flowchart (the whole is much bigger) where the request is received from the application programmed in Dialogflow:
As it can be seen in the first node, an endpoint is required. Therefore, Webhook must be activated in Dialogflow with de appropriate URL.
Dialogflow is a natural language understanding platform that makes it easy to design and integrate a conversational user interface into your mobile app, web application, device, bot, interactive voice response system, and so on. Using Dialogflow, we can provide new and engaging ways for users to interact with
To create an app, we first create the intents. For example, the following image adds the sentence: «which patient is in room x?
Then, in Node-RED this intent is received, and the desired answer is added.
What is the current status of the project?
Now, all the technical part is finished. There have been normal drawbacks in a project of this type, but these have been resolved.
The following image shows the actual hardware assembly of the system:
The next step is the design of a 3D part that encompasses the complete system. In this way, the incorporation into the real robot will be facilitated.
This project was born thanks to the collaboration between Scalian Spain and University of Málaga. It is carried out by Juan Antonio Ramírez and tutored by Francisco Javier Camacho Bermúdez and Alejandro Hidalgo Paniagua within Scalian.
Also, it’s guided within the C++ centre for excellence at Scalian Spain, inside the line of talent acquisition and innovation with C++ technology (ROS, in this case).