Inspiration
π Welcome to VibeSync! π΅ Dive into a world where your videos and music are in perfect harmony! Get ready to see your moments transformed as we create a new video where the music pulses to the rhythm of your content. Experience your memories in a whole new way with VibeSync! π Letβs create magic together! β¨
What it does
With VibeSync, you can simply submit your video and let the app sync it with the most fitting music based on its context π₯πΆ. It solves the problem for people who don't like to search through large galleries of music files and test each one to check which one matches the "vibe" of the video. It's also helpful for film producers who want music recommendations for their videos π¬π.
How we built it
We created this magical application using a Google Gemini as our "internal music producer."
Technical details of our hack -
- A video file is passed to the Gemini API, which returns a JSON structure consisting of key and value pairs of timestamps of the video and music titles Gemini recommends for that particular timestamp.
- The recommended music titles are collected and downloaded using the open-source music download library. We make new pairings of key and value pairs of video timestamps and the downloaded music's path.
- All downloaded music files are cropped according to their corresponding time stamps and concatenated into a single audio file.
- The merged audio file is then applied to the video file uploaded by the user. The final video is then displayed to the user on the front end.
Challenges we ran into
- The biggest challenge we had to tackle was to get Gemini to format its response in JSON with keys as timestamps of the video and the values as the name of the recommended music. We had to try a lot of prompt engineering, fiddling with the system configurations and temperature settings. In the end, we couldn't get Gemini to output a JSON, but it produced a JSON-like response in string. So, we performed a lot of string manipulation to extract the key and value pairs from Gemini's pseudo-JSON-like string response.
- As Gemini allows fine-tuning using only input-output examples, we weren't able to do so as our input is a video. Fine-tuning Gemini is possible only through input-output text pairs.
Accomplishments that we're proud of
We are very pleased with how we overcame difficult challenges related to processing Gemini outputs. We are also proud of how we processed audio files effectively to perform complex crop and merge operations, along with subtle transition effects like crossfade to ensure a smooth, seamless audio-video experience.
What we learned
After spending several hours on this project, we've gained a deep appreciation for the capabilities of advanced multi-modal Large Language Models (LLMs) like Gemini ππ€. One distinct memory we all share is the genuine astonishment and wonder we felt when we witnessed Gemini perform complex tasks with unprecedented effectiveness. We spent considerable time exploring different possibilities, which was both exciting and a bit overwhelming π€―. In the end, the process of working with Gemini and interacting with other participants was incredibly rewardingπ‘. We learned not only about technology but also about effective teamwork and the creative process. This project was a significant learning experience for everyone involved and other participants ππ.
What's next for VibeSync
We plan to include more customization options for the user using Gemini to generate a bunch of options for each video segment.
The users will be able to listen to the various audio options and make their own decisions.
We plan to also add more refined sound mixing abilities right in the main interface so music volume and any original sounds the video might have, like conversations, can be adjusted.
We also plan to give the user more capability to fine-tune the music suggestion Gemini gives.
Log in or sign up for Devpost to join the conversation.