Inspiration

A large number of people walk around with earbuds or headphones nowadays. Many modern earbuds also come with noise cancellation, for full immersion. However, this leaves the user unable to hear or react to their surroundings.

What it does

As a result, we made this technology to allow users to stay fully immersed in their music, audiobooks, or podcasts as they are walking, while still being able to hear any present dangers in their surrounding environment. The program records audio in real-time, updating every 150ms or 0.15s to allow for immediate reaction from the user.

How we built it

The program uses SpeechBrain, a PyTorch-based speech toolkit with a set of 8k urban sounds as data points. The program repeatedly sends a 0.15-second long segment of audio to the deep learning AI to determine a probability score of the audio file matching a class of sounds. The types of sounds included are ["air_conditioner", "car_horn", "children_playing", "dog_bark", "drilling", "engine_idling", "gun_shot", "jackhammer", "siren", "street_music"]. ["car_horn", "siren", "gun_shot"] are the sounds flagged for danger by the program. The program then determines the most likely class that the audio may belong to, and if the probability of belonging to that class is above a threshold, it pauses both audio output and noise cancellation to allow the user to be aware of their surroundings. The pause lasts for the length of the hazardous noise + 1 additional second.

Challenges we ran into

By default, the software should be rather sensitive because it is preferable to accidentally pause audio than to not have the audio pause in an emergency. For the user to have more control, they can have the ability to change the “sensitivity” for the user by letting them set the threshold. This allows people in big cities to filter out the large constant background noise, while those in suburbs can keep the threshold of the danger function lower. As a result, the software can cause the audio to pause on occasion for “no reason”

Accomplishments that we're proud of

We are proud that we were able to have our AI model work with a real-world application that would be useful to many people as noise-canceling technology continues to spread.

What we learned

We learned about applying Fourier Transformations and MFCC to audio files for speech recognition. Applying the ftt function manipulates the audio to a graph of amplitude and frequency. From there, a threshold can be applied to the transformed function to eliminate all noise and only extract the important sound bits. There exists a MEL scale which is based on the way that we humans distinguish between frequencies. The MEL scale is used to divide the frequency band into multiple smaller bands that are then extracted in Cepstral Coefficients using the discrete cosine transformation.

What's next for AlertAI

Future directions for this project are to implement the software into an app or integrate it into existing headphones. This would allow for ease of access to this technology as well as an easy way to adjust the sensitivity or even turn it off.

Built With

Share this project:

Updates