Speech Technologies for daily life: Voice Assistants, ChatBots and Spoken Dialogue Systems.

14/11/2023 al 17/11/2023

Eduardo Lleida Solano, Universidad de Zaragoza y Carlos David Martínez Hinarejos, Universidad Politécnica de Valencia.

130 euros
110 euros
Matrícula gratuita para los miembros de la Red Temática en Tecnologías del Habla, estudiantes de máster y doctorado.

In today's fast-paced world, speech technologies have become integral to our daily lives, revolutionizing the way we interact with technology and enhancing our overall experiences. This school aims to provide participants with a comprehensive understanding of voice assistants, chatbots, and spoken dialogue systems, covering their underlying technologies, applications, and future trends.

Through expert-led lectures, hands-on workshops, and interactive discussions, attendees will delve into the world of conversational agents, natural language processing, automatic speech recognition, and machine learning algorithms powering these technologies. The program will also include a keynote talk by a renowned expert in the field.

Overall, the school aims to provide participants with a comprehensive understanding of conversational systems, their development processes, and practical experience using RASA as an open-source platform.

By the end of the school, participants will have gained valuable insights and practical skills to develop and deploy their own speech-enabled applications, enabling them to leverage the power of speech technologies for various domains such as smart homes, healthcare, customer service, and more.

More information at http://rtth.vivolab.es


Martes, 14 de Noviembre

  • 09:00 h. Opening Ceremony. 
  • 09:30 h. Introduction to Conversational Systems. The objective of this lecture is to provide a basic understanding of the principal components of a Conversational System. First, we will present each of the modules and technologies involved, such as the speech transcriber (ASR), the language understanding, (NLU) the language generation (NLG) or the text-to-speech (TTS), among others. Then, we will focus on the Dialogue Manager, which is the core decision-making module of the system, and discuss various approaches for its design. [5]
  • 11:30 h. Keynote: Data-driven speech and language technology: from small to large models. Today data-driven methods like neural networks and deep learning are widely used for speech and language processing. We will re-visit the evolution of the data-driven methods over the last 40 years and present a unifying view of the underlying principles, which will be based on Bayes decision rule. Specifically the talk will focus on speech recognition and language modelling. [2]
  • 15:00 h. Conversational Systems Development: An Engineering Perspective. The rapid evolution of conversational systems is driving advancements in the methods, techniques, and tools used for their development. In this class, we will explore conversational systems development from an engineering perspective, encompassing the entire lifecycle of the process. We will cover the prototyping techniques employed in industrial scenarios, emphasizing practical system design considerations for achieving a seamless and fluid conversational user experience across various communication channels. Additionally, we will examine the prevalent architectures used for development and deployment, while focusing on widely-used solutions like Google DialogFlow, Amazon Alexa, or RASA. By the end of the class, you will have gained comprehensive insights into the practical intricacies of developing conversational systems and be well-equipped to start with the practical lab. [6]
  • 18:00 h. Visit Ciudadela de Jaca. 

Miércoles, 15 de Noviembre

  • 09:00 h. Speech in Spoken Dialogue Systems. In classical approaches for Spoken Dialogue Systems (SDSs), audio signals are initially decoded into word sequences by Automatic Speech Recognition systems (ASR). Subsequently, Natural Language Processing (NLP) techniques are applied to understand the user and finally respond accordingly. However, this approach ignores crucial information embedded in speech signal, such as the speaker's emotional mood and prosody, or the environmental noise level, which could be essential for enhancing dialogue strategies. In this presentation, we will emphasise the significant role of the information encoded in the audio signal. In this framework, we will additionally introduce some hybrid architectures that include both classical modules and novel end-to-end models. [5]
  • 10:15 h. Understanding ChatGPT: Technology, Trends and Challenges for Conversational Systems (Part 1). Conversational systems such as ChatGPT have become a trending technology due to their ability to provide seamless, personalized, and engaging experiences to users. Thanks to the widespread use of messaging apps, voice assistants, and the mass access to outstanding DNN-based models, these systems are transforming the way we interact with technology, between each other, and how industries also provide very accessible and customizable services to final users.
  • In this class, we will first understand the technology at a high level, understanding their strengths and limitations; then moving into the current trends and open challenges, including automatic evaluation, ethical aspects, and some pointers to the current models and trends for open and task-oriented dialogue systems. Finally, some personal experiences and current research will be shown along the topics to describe the efforts we are doing in Spain to also advance and shape the future of conversational systems. [4]
  • 11:30 h. Understanding ChatGPT: Technology, Trends and Challenges for Conversational Systems (Part 2). [4]
  • 15:00 h. Automatic Dialogue Evaluation for Conversational Systems. For a long time, conversational systems (chatbots) have attracted significant interests from both academia and industry. The widespread usage of ChatGPT last year brought even more attention, from both the public and researchers, especially for generative models and open-domain dialogue systems. However, current metrics are not fully aligned with the training process in such systems; where performance is mainly measured by using extensive human evaluations, which is both time- and cost- intensive. Hence, it is important for researchers and practitioners to know and understand the existing proposed automatic evaluation metrics. 
  • In this class, we will first describe the existing taxonomy of dialogue evaluation and standard benchmarks. Then, we will present the various common NLG metrics that are used in dialogue evaluation and the problems associated with them. Next, we will see the newly established proposed reference-free, and model-based metrics. After that, we will describe the future research directions. Finally, we will perform some hands-on tasks to be able to understand and practice the different aspects of the pipeline for evaluating dialogue systems. [4]

Jueves, 16 de Noviembre

  • 09:00 h. Hands-on workshop: Conversational Systems Development with RASA, Part 1. RASA is the leading open source platform for conversational systems development. Throughout the two lab sessions, we will embark on a hands-on exploration of the fundamental components that constitute the RASA ecosystem. We will examine RASA's core functionalities, such as intent recognition, entity extraction, and dialogue management using interactive stories, all with an emphasis on best practices. Through practical exercises and challenges, you will gain experience in designing and training conversational models within the RASA framework, resulting in a fully functional system. [3] [5] [6]
  • 11:30 h. Hands-on workshop: Conversational Systems Development with RASA, Part 2. [3] [5] [6]
  • 15:00 h. Conversational Systems Development with RASA, Part 3. [3] [5] [6]

Viernes, 17 de Noviembre

  • 09:30 h. Visit to the Canfranc Underground Laboratory (Laboratorio Subterráneo de Canfranc). 
  • 15:00 h. Three Minute Thesis Presentations. The Master and PhD students participating in the Fall School are invited to demonstrate their ability to communicate complex ideas, breakthroughs, and research findings effectively in a limited timeframe, just like an elevator pitch in the professional world. Each participant will have three minutes to captivate the audience, conveying the essence of their research projects while maintaining clarity and engagement. 

1 Eduardo Lleida Solano, Universidad de Zaragoza
2 Hermann Ney, Senior Professor
3 Javier Mikel Olaso Fernández, Investigador Contratado Doctor
4 Luis Fernando D`Haro Enriquez, Profesor contratado doctor
5 María Inés Torres Barañano, Catedrática de Universidad
6 Zoraida Callejas Carrión, Profesora Titular de Universidad


El curso está abierto a todos los estudiantes de Máster y Doctorado, investigadores y profesionales interesados en aprender y actualizar sus conocimientos sobre sistemas conversacionales.

Los asistentes al curso deberán llevar su ordenador portátil para realizar las prácticas.

Solicitado el reconocimiento como créditos por las Actividades universitarias culturales por la Universidad de Zaragoza.

0,5 ECTS

CERTIFICADO DE ASISTENCIA: El alumnado tendrá derecho a un Certificado de Asistencia que acreditará su participación en el curso siempre que haya asistido al menos al 85% de las horas lectivas presenciales.

DIPLOMA DE CRÉDITOS: Para la obtención del Diploma de Créditos ECTS, según la normativa vigente en la Universidad de Zaragoza, será imprescindible superar el procedimiento de evaluación exigido por quienes coordinen el curso. En este sentido, se valorará la asistencia, la participación activa en los debates y los trabajos realizados en los talleres.

