Skip to content

CONVERSATIONAL AGENT FOR SOCIAL ROBOTICS USING THE ROS FRAMEWORK

In recent years, robotic has advanced in leaps and bounds. Nowadays, the term advanced robotics is used to refer to technologies that interact with the real-world environment to solve real-world problems. Some of the applications of advanced robots are in healthcare and medicine.

 In many cases, these applications require a virtual assistant to interact with the robot. A virtual assistant is a software agent that can perform tasks or services for an individual based on commands or questions. In other words, it simulates a conversation to deliver voice or text-based information to a user. Then, users can interact with an artificial agent by just speaking.

The following project described below consists of a conversational agent that is activated by a customised keyword. Once activated, the user can have a conversation with the system, ask it a specific question, etc. It is important to note that this project will be part of a robot that will operate inside a hospital with the idea of assisting patients.

To perform this, we rely on the Google Home Mini, a smart speaker from Google. And why we use this smart speaker?  The reason is that, even if we train our own conversational engine for a long time, it will not come to resemble the immensely powerful Google engine. 

It should be pointed out that it has been programmed to work not only with the Google speakers but also with Alexa. However, this article only refers to the Google speaker (Google Home Mini),  as it is the one that has been used.

Thus, what can be achieved with this project? 

First, the system can be woken up by saying a customized keyword. Internally, the system wakes up the Google speaker and the conversation would take place with this. Thanks to programmed conversations in Dialogflow, use cases can be created in which the speaker responds as desired to determine questions. For example, we can say: “Hello Victory, where is patient Juan?” and the answer could be: «Patient Juan is in room 25».

This work covers another very important objective in robotic systems. In most of these systems, it is the user who starts the conversation, while in this case the robot will be able to initiate the conversation on its own when it wants to say hello or detect a certain event, for example.

What is ROS?

The Robot Operating System (ROS) is an open-source framework that help developers build and reuse code between robotics applications. 

The operation of ROS is based on a set of nodes that can communicate which each other by publishing and subscribing to a topic thanks to the ROS Master.
These nodes can be programmed in C++ or python. 

There are two ways for nodes to communicate. The first is through messages, where the publisher sends a message (from ROS or created by the developer) and the subscriber receives it.

On the other hand, the second way is the services. These allow nodes to send a request and receive a response.

What are the component used to carry through this project?

The hardware components used are:

Raspberry Pi 4

The Raspberry Pi is a low cost, credit-card sized computer that can plugs into a computer monitor or TV and uses a standard keyboard and mouse. It is capable of doing everything you’d expected a desktop computer to do, from browsing the internet and playing high-definition video, to making spreadsheets, word processing and playing games. The four version include a high-performance 64-bit quad-core processor.
Moreover, the Raspberry Pi has the ability to interact with the outside world via the GPIO pin (General Purpose Input / Output).
In this work, the Linux operating system runs on this card, specifically, the Debian Buster 10 release.

Matrix Voice

Matrix Voice is a development board for building sound driven behaviour and interfaces. It was built with a mission to give every maker and developer a complete, affordable, and user-friendly tool for simple to complex Internet of Things (IoT) voice app creating.
This board has integrated an 8-microphone array, 18 RGB LEDS, an ESP32 and an FPGA. In addition, this component also has GPIO pins. Matrix Voice is connected to the Raspberry GPIO pins, as shown in the picture in the left.

In terms of software, these are some of the functions of the project:
Detect custom Wake Word

For detect the custom Wake Word, we use Rhasspy; an open source, fully offline set of voice assistant services for many human languages.
Out of all the Rhasspy tools available to detect a keyword, Pocketsphinx has been chosen because of its ease of use.
Rhasspy comes with a snazzy web interface that lets us configure, program, and test our voice assistant remotely from our web browser. All the web UI’s functionally is exposed in a comprehensive HTTP API. 

The following image shows some of these functions:

In brief, we have a ROS node that publishes a message each time the wakeword is detected.

Play noise

When the system is waiting for the wakeword, a sound is emitted on the other hand, which “blocks” the Google Home Mini. This is important as it allows the Google speaker not to wake up is someone says “ok google” instead of our keyword.
And how do we play that noise? The answer is through two small speakers connected to the Raspberry. As soon as the keyword is received, the sound stops.

Wake up Google Home Mini

When the keyword is received, the same speakers that used to emit noise now pronounce the keyword from the Google Mini.  From this moment on, the conversation takes place with the Google engine.

Start the conversation

As mentioned above, the system can start the conversation. To do this, the speaker is made to play the Google keyword. Then, as we will see below, Node-Red picks up this empty request, processes it by adding the text to be spoken and sends it back.
Thus, the user only sees how the robot has been able to start a conversation on its own.

State machine

A state machine has been created using a tool provided by Qt, a crossplatform application development framework, in C++ language.
The transitions of the created states are realised by Qt signals, i.e., when a certain event occurs, a corresponding signal is emitted that causes a change of state. 

The most prominent states are:

  • Noise-emitting state.
  • State that creates a red Matrix light show.
  •  State waiting for the wakeword.
  •  State that illuminates the matrix in green to symbolise that conversation is possible.
  • State that wakes up the Google Home Mini.
  • State that waits for the end of the conversation to return to the beginning.

The last state has some complexity, as we have had to adjust the timing of the conversation depending on the size of the sentence you want to ask or whether the conversation has not ended, and it is still ongoing.

Node-RED

Node-RED is an Open-Source flow-based tool for connecting hardware devices and online services. Programming is done graphically by connecting predefined blocks, called nodes. The set of this nodes, usually divided into input nodes, processing nodes and output nodes, form what we know as flow.

The following image shows a small part of the flowchart (the whole is much bigger) where the request is received from the application programmed in Dialogflow:

As it can be seen in the first node, an endpoint is required. Therefore, Webhook must be activated in Dialogflow with de appropriate URL.

Dialogflow

Dialogflow is a natural language understanding platform that makes it easy to design and integrate a conversational user interface into your mobile app, web application, device, bot, interactive voice response system, and so on. Using Dialogflow, we can provide new and engaging ways for users to interact with
our product.
To create an app, we first create the intents. For example, the following image adds the sentence: «which patient is in room x?

Then, in Node-RED this intent is received, and the desired answer is added.

What is the current status of the project?

Now, all the technical part is finished. There have been normal drawbacks in a project of this type, but these have been resolved. 

The following image shows the actual hardware assembly of the system:

The next step is the design of a 3D part that encompasses the complete system. In this way, the incorporation into the real robot will be facilitated.

This project was born thanks to the collaboration between Scalian Spain and University of Málaga. It is carried out by Juan Antonio Ramírez and tutored by Francisco Javier Camacho Bermúdez and Alejandro Hidalgo Paniagua within Scalian.
Also, it’s guided within the C++ centre for excellence at Scalian Spain, inside the line of talent acquisition and innovation with C++ technology (ROS, in this case).

Por Francisco Javier Camacho Bermúdez.

Política de Privacidad

En cumplimiento del Reglamento (UE) 2016/679 del Parlamento Europeo y del Consejo, de 27 de abril de 2016, relativo a la protección de las personas físicas en lo que respecta al tratamiento de datos personales y a la libre circulación de estos datos, usted consiente que los datos de carácter personal facilitados sean tratados por SCALIAN SPAIN para gestionar la solicitud que usted nos realice a través del presente canal online.


Los datos que usted proporcione no serán cedidos a terceros salvo que de su petición se derive la  necesidad de comunicárselos a alguna de las empresas que componen SCALIAN SPAIN o salvo que exista una obligación legal que así lo exija.

SCALIAN SPAIN le informa que puede ejercer sus derechos de acceso, rectificación, supresión, cancelación, oposición, limitación del tratamiento y portabilidad en los términos especificados en la legislación sobre protección de datos, dirigiendo una comunicación a la dirección de correo electrónico: info.spain@scalian.com o dirigiéndose por escrito a: SCALIAN SPAIN, Avda. del General Perón, 36 2ª planta, 28020 Madrid.

De igual modo, la Compañía se compromete a utilizar los datos exclusivamente de acuerdo con las finalidades reflejadas en la presente cláusula y a almacenar los mismos en los sistemas de SCALIAN SPAIN, durante el tiempo necesario para la tramitación y gestión de su solicitud.

Política de Cookies

Una cookie es un pequeño fichero de texto que se almacena en su navegador cuando visita casi cualquier página web. Su utilidad es que la web sea capaz de recordar su visita cuando vuelva a navegar por esa página. Las cookies suelen almacenar información de carácter técnico, preferencias personales, personalización de contenidos, estadísticas de uso, enlaces a redes sociales, acceso a cuentas de usuario, etc. El objetivo de la cookie es adaptar el contenido de la web a su perfil y necesidades, sin cookies los servicios ofrecidos por cualquier página se verían mermados notablemente. Para más información, puede consultar en la Ayuda de su navegador o en las páginas de soporte de los mismos:
  • Chrome: support.google.com
  • FireFox: support.mozilla.org
  • Internet Explorer: windows.microsoft.com
  • Safari: http://www.apple.com
  • Opera: http://www.opera.com/help/tutorials/security/cookies/
  • Edge: https://privacy.microsoft.com/es-es/windows-10-microsoft-edge-and-privacy
 

Cookies utilizadas en este sitio web

Siguiendo las directrices de la Agencia Española de Protección de Datos procedemos a detallar el uso de cookies que hace esta web con el fin de informarle con la máxima exactitud posible. Este sitio web utiliza las siguientes cookies propias:
  • Cookies de sesión, para garantizar que los usuarios que escriban comentarios en el blog sean humanos y no aplicaciones automatizadas. De esta forma se combate el spam.
Este sitio web utiliza las siguientes cookies de terceros:
  • Google Analytics: Almacena cookies para poder elaborar estadísticas sobre el tráfico y volumen de visitas de esta web. Al utilizar este sitio web está consintiendo el tratamiento de información acerca de usted por Google. Por tanto, el ejercicio de cualquier derecho en este sentido deberá hacerlo comunicando directamente con Google.
  • Redes sociales: Cada red social utiliza sus propias cookies para que usted pueda pinchar en botones del tipo Me gusta o Compartir.
 

Desactivación o eliminación de cookies

En cualquier momento podrá ejercer su derecho de desactivación o eliminación de cookies de este sitio web. Estas acciones se realizan de forma diferente en función del navegador que esté usando.
  • Chrome: Configuración -> Mostrar opciones avanzadas -> Privacidad -> Configuración de contenido.
  • Firefox: Herramientas -> Opciones -> Privacidad -> Historial -> Configuración Personalizada.
  • Internet Explorer: Herramientas -> Opciones de Internet -> Privacidad -> Configuración.
  • Safari: Preferencias -> Seguridad.
  • Opera: Herramientas -> Preferencias -> Editar preferencias > Cookies
  • Edge: Configuración -> Ver configuración avanzada -> Privacidad y servicios -> Cookies
 

Notas adicionales

  • Ni esta web ni sus representantes legales se hacen responsables ni del contenido ni de la veracidad de las políticas de privacidad que puedan tener los terceros mencionados en esta política de cookies.
  • Los navegadores web son las herramientas encargadas de almacenar las cookiesy desde este lugar debe efectuar su derecho a eliminación o desactivación de las mismas. Ni esta web ni sus representantes legales pueden garantizar la correcta o incorrecta manipulación de las cookies por parte de los mencionados navegadores.
  • En algunos casos es necesario instalar cookiespara que el navegador no olvide su decisión de no aceptación de las mismas.
  • En el caso de las cookiesde Google Analytics, esta empresa almacena las cookies en servidores ubicados en Estados Unidos y se compromete a no compartirla con terceros, excepto en los casos en los que sea necesario para el funcionamiento del sistema o cuando la ley obligue a tal efecto. Según Google no guarda su dirección IP. Google Inc. es una compañía adherida al Acuerdo de Puerto Seguro que garantiza que todos los datos transferidos serán tratados con un nivel de protección acorde a la normativa europea.
  • Para cualquier duda o consulta acerca de esta política de cookiesno dude en comunicarse con nosotros a través del correo electrónico: info.spain@scalian.com.