Amazon adds Catalan to MASSIVE dataset

January 16, 2024

[ad_1]

Earlier this year, we released MASSIVE, a million-record natural-language-understanding (NLU) dataset composed of human-translated utterances spanning 51 languages, 18 domains, 60 intents, and 55 slot types. We are pleased to announce the release of MASSIVE 1.1, which includes new data for the Catalan language.

Antoni Gaudí’s Sagrada Família basilica in the Catalan capital, Barcelona.

Mapics / stock.adobe.com

Instructions for downloading MASSIVE 1.1 can be found at our Github repository, alexa/massive. The dataset is also available from Hugging Face. For more information on the dataset, please see our paper.

One immediate customer of the additional Catalan data is the Barcelona Supercomputing Center.

“Project AINA, dedicated to creating an advanced AI infrastructure for the Catalan language, is very excited about the inclusion of our language in the MASSIVE 1.1 dataset,” said Carlos Rodríguez Penagos, a researcher with the center’s text-mining unit. “This is a big step forward for digital assistants and chatbots that are able to converse fluently with people in their own language, a vital requirement of modern digital ecosystems. Amazon’s addition of Catalan to the MASSIVE dataset is very good news for languages that up to now were not well represented in the online platforms that we use daily. We will add this task to the CLUB [the Catalan Language Understanding Benchmark], the AI performance reference for this language. Thanks, Amazon, for this important initiative.”

Amazon adds Catalan to MASSIVE dataset

A quick guide to Amazon's 40+ papers at EMNLP 2022

Amazon's annual machine learning conference focuses on community and connections

Using Amazon web traffic to track the eclipse

A quick guide to Amazon’s 20+ papers at ICASSP 2024

Play the latest Prime games – Fallout 3 and Fallout: New Vegas

Help support the National Park Foundation by watching select Prime Video content on Fire TV.

Leave a reply Cancel reply