Immagine

Dolphingemma, the new Google AI model to interpret the language of dolphins

Pack of Atlantic Macuolate dolphins, Stenella Frontalis. Credit: Google.

Understanding dolphins is no longer just a dream for ethologists and fans of the marine world: today, thanks to theartificial intelligencenew paths open to a deeper communication between human beings and cetaceans. On the occasion of the World Delfini Day, Google has announced an extraordinary result: Dolphinxa model to develop in collaboration with the Georgia Institute of Technology and the WDP (Wild Dolphin Project), is able to analyze the vocalizations of the dolphins and generate new similar sounds in a realistic way. This project represents an important step forward in the Understanding of the “language” of the Atlantic Macuolate dolphins (Stenella Frontalis), a species studied in depth for over thirty years.

By combining a huge amount of data collected under water with advanced audio algorithms, Dolphingemma can identify recurrent sound patterns, opening the way to an interactive communication between the species. And it does not end here: through a system called Chatforms of bidirectional dialogue are also being tested, using artificial sounds to facilitate interaction with animals.

The birth of the Dolphingemma model

For decades, the language of dolphins was a fascinating mystery. Whistles, clicks, hums and impulsive sounds filled the researchers’ audio archives, but interpreting their meaning has always been a complex task. The Wild Dolphin Projectactive since 1985 in the waters of the Bahamas, he conducted the longest field investigation ever carried out on a single community of dolphins. His non -invasive approach – summarized in the motto “in their world, to their conditions” – has made it possible to collect a single archive: decades of audio and underwater video with care carefully to the behaviors and individual identities of the observed dolphins. This data assets made it possible to start connect specific sounds to recurring behavioral situations: for example, i “Characteristic whistles”similar to proper names, used to recall each other or “Squawk” intermittent, associated with conflicts between individuals.

It is on this very rich corpus that the Google technology was grafted. Dolphinx is a Advanced audio model based on architecture Gema series of light and open linguistic models. Unlike only textual models, Dolphingemma is audio-in and audio-out: Listen to a sound sequence of the dolphins, analyzes the internal structure and generates a coherent continuation, just as predictive linguistic models do with words. The base technology relies on Soundstreaman audio tokenizer capable of efficiently representing complex vocal signals, and a model with about 400 million parameterssized in order to work even on smartphones, in particular i Google Pixel used directly on the field.

Image
Left: a spotted dolphin mother observes her puppy while looking for food. Use its characteristic whistle to recall the puppy once the meal is finished. Right: spectrogram to view the whistle. Credit: Google.

In addition to the analysis of natural sounds, the project includes another research front: the bidirectional communication. Here the system comes into play Chat (Cetacean Hearing Augmentation Telemetry), an underwater interface developed together with Georgia Tech. Chat does not aim to directly translate the language of dolphins, but to create a shared vocabulary starting from artificial whistles. These sounds, associated with family objects with dolphins, such as algae or games, are presented in the natural context. If a dolphin imitates the corresponding whistle, he can receive the object in question, thus strengthening the association.

The use of Google Pixel phones in these experiments represents an important advantage: it allows to reduce the need for specific hardware, lowering costs, consumption and size of the equipment. The latest models of the “Big G” smartphone, such as Pixel 9, are able to perform both models of Deep Learning as a dolphinx that algorithms of Template Matchingthat is, of comparison with already known sounds, improving the ability of the system to respond quickly to the vocalizations of the dolphins. Researchers can thus receive immediate feedback through bone conduction headphones, remaining immersed in the marine environment.

The opening of the project to the scientific community

A crucial aspect of the project is his opening to the scientific community. Google has announced that it will share Dolphingemma like Open Source model in the summer of 2025making it available to researchers who also work on other species of cetaceans, such as the tursopi or dolphins with a long rostrum. Although it will be necessary to adapt the model to the specific vocalizations of each species, the modular structure of Dolphingemma facilitates its customization and scalability.