MBARI Shares Trove of Acoustic Data on the Amazon Web Services Cloud
An underwater microphone located in the depths of Monterey Bay has recorded tens of thousands of hours of sound.
That extensive acoustic archive has been invaluable to MBARI researchers and collaborators. This unique dataset has provided valuable insight into the ocean soundscape for the past six years—from the behaviors of whales to the impacts of human activities on marine life. Now, Pacific Ocean Sound Recordings makes that trove of acoustic data accessible to researchers around the world via the Registry of Open Data on the Amazon Web Services (AWS) cloud.
“Researchers and students around the world, who otherwise may not have access to high-quality ocean acoustic data, can tap into a tremendous data resource and explore the ocean’s world of sound in ways I never imagined,” said John Ryan, a biological oceanographer at MBARI.
“The cloud and advances in machine learning can undoubtedly pave the way to accelerate discoveries in these unique, high-quality data,” added Danelle Cline, a software engineer at MBARI. “I hope that releasing these data spurs new ideas and partnerships between science and machine-learning specialists to help unlock the mysteries of the soundscape.”
Better information sharing has the power to accelerate discoveries and improve the world around us. The research community can now access Pacific Ocean Sound Recordings on AWS without needing to pay to store their own copies of the dataset. Researchers will only pay for the computing services they use and modest costs for data storage. AWS, through its Open Data Sponsorship Program, is covering the costs of the storage and transfer of the data, so that it can be accessed and analyzed in the cloud by researchers around the world.
The cloud-based application of machine learning in the Pacific Ocean Sound Recordings project can automatically detect songs from humpback whales (Megaptera novaeangliae). Here, MBARI applied a convolutional neural network previously developed by researchers within NOAA and Google’s AI for Social Good program. Image: © 2021 MBARI
MBARI’s Pacific Ocean Sound Recordings dataset comprises a six-year archive of audio recordings that began in July 2015. The archive grows continuously as active recording adds to it.
The original recordings were acquired at a very high sample rate of 256,000 samples per second (256 kHz). These recordings—almost 50,000 hours of sound—are available on the AWS cloud. However, many research applications can be well served by lower resolution recordings with a much smaller data volume. For this purpose, the dataset includes daily files of recordings decimated to lower sample rates (16 kHz and 2 kHz). All audio data are in WAV format. Beyond making all of the data products available and keeping them updated, this project provides examples for working with the data in the cloud, applying signal processing and machine-learning tools. These tools are essential to enable effective data sharing and use.
The total volume of the raw data is 140 terabytes, and growing. “If sound were an image, similar to the size of the image you capture on your cell phone, this would be 41 million images,” explained Cline. “Such a massive amount of data would be too much for a person to analyze, so this is where machine-learning services from cloud providers like AWS can really help.”
In 2015, MBARI installed an underwater microphone, or hydrophone, on a deep-sea cabled ocean observatory in Monterey Bay. Funded by the National Science Foundation, the MARS (Monterey Accelerated Research System) observatory offers a platform for observing the ocean in new ways. A 52-kilometer (32-mile) cable provides power and data for instruments plugged into the observatory’s main node.
An underwater microphone 900 meters (3,000 feet) deep in Monterey Bay records the ocean’s soundscape, from the calls of marine mammals to noises made by human activities. Image: © 2016 MBARI
The hydrophone eavesdrops on the underwater soundscape. It records the bellowing moans and groans of whales, the quick clicks and clacks of dolphins, the grumbles of earthquakes, and even the pitter-patter of raindrops on the ocean’s surface. The microphone hears the sounds from human activities too, like the growl of shipping noiseor the sharp pops of seal bombs.
“What’s unique about these data is the combination of two valuable attributes: continuous recording and coverage of a great range of frequencies,” explained Ryan.
Unlike recorders that are isolated and run on batteries, MBARI’s recorder continuously streams data to shore and never runs out of power or data storage. This enables effective observation for long periods of time, as required to understand biological complexity, variability, and change in relation to ecosystem change. Measuring sound across a great frequency range—within and far outside the range of human hearing (10 to 100,000 Hz)—allows the MBARI team to study a great variety of sound sources.
The MARS cabled observatory allows researchers to use sound information in real time and, importantly, easily archive the acoustic data. The observatory also provides valuable opportunities for outreach and live audio from the hydrophone stream to the Soundscape Listening Room.
“Sharing these data will bring new opportunities to learn and collaborate,” emphasized Ryan. He noted how researchers studying blue whales in other parts of the world will be able to learn from MBARI’s data and analysis methods and vice versa. “Collaboration accelerates progress.”
“We worked to make these recordings publicly available because that is how the value of the data can be most fully realized and how new knowledge can grow most vigorously,” said Ryan.
In addition to posting and maintaining acoustic data on the AWS Registry of Open Data, the Pacific Ocean Sound Recordings project provides examples for working with the data in the cloud, applying signal processing and machine-learning tools. Here, cloud application of signal processing quantified noise from shipping traffic using internationally standard methods. Image: © 2021 MBARI
Sharing such a large data set is challenging, so MBARI teamed with the AWS Open Data project. The AWS Open Data Sponsorship Program covers the cost of storage and egress for publicly available, high-value, cloud-optimized datasets. AWS works with data providers to enable access to data by making it available for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through the program, AWS has democratized access to petabytes of data, including satellite imagery, climate and weather data, genomic data, and data used for natural language processing.
Open access to scientific data is a fundamental value for MBARI and part of the institute’s mission.
Data and technology from many MBARI projects are publicly available. From oceanographic data collected by ongoing monitoring efforts to detailed maps of the deep seafloor to identification guides for deep-sea animals, MBARI is eager to share its expertise with the global scientific community. MBARI is an active participant in several research collaborations, including the Southern Ocean Carbon and Climate Observations and Modeling project (SOCCOM), the Global Ocean Biogeochemistry Array (GO-BGC Array) project, and the Central and Northern California Ocean Observing System (CeNCOOS) collaborative. Visit MBARI’s data repository for additional projects and data available for public use. MBARI’s technological innovations are likewise available for licensing.
To access MBARI’s Pacific Ocean Sound Recordings data, visit registry.opendata.aws/pacific-sound.