Inspiration

At the heart of LLM's value proposition lies its ability to distill vast amounts of complex information into concise snapshots, capturing the essence of the original media. Today, we've brought this concept to consumer tech, a familiar landscape for us netizens. YouTube playlists and the watch-later feature have become intuitive ways to bookkeep longer-form media for later reference. These playlists offer users a curated snapshot of their interests in useful, referential content, which is ripe for indexing and future exploration.

What it does

Our solution aims to bridge the gap between consuming content and retaining valuable insights, empowering users to make the most of their curated playlists and optimize their learning experience.

By indexing users' playlists and analyzing video content, our tool provides structured summaries with timestamps, making it easy to recall and cross-reference key information. Users can prompt questions based on their watch-later content, allowing for seamless retrieval of useful reference material.

How we built it

We developed two primary compute workflows - the first involves indexing and running inference on a user's set of playlists, and pre-computing insights, notable moments, and a comprehensive internal reference document that captures the visual/audio details into text form. This optimizes for our second compute workflow, which is more ad hoc, and occurs when a user wants to set a prompt with respect to their playlist. Our gemini model utilizes text inputs to perform inference on the new prompt, providing both quick and detailed insights into the question. Because video/audio has already been processed, we take advantage of the multimodal input capabilities of gemini. In the future, we plan to batch this computation as new information comes in for a user, such as new playlists, bookmarks, and videos added to watch-later.

We leveraged a JS web client, and backend server set that is able to interact with Youtube Data and Gemini LLM APIs. In the future, we hope to port this as an extension or a browser companion. Platform features like this are intuitive add-ons in a new age of accessible large language model compute.

Challenges we ran into

Designing the two-stage compute workloads: balancing meaningful insights with usability and latency posed a significant challenge. Scalability wise, adapting a solution to handle a large volume of user data while maintaining performance and reliability was also a challenge. Videoaudio data fundamentally represents a huge amount of data. Being able to effectively identify which data to process while retaining insights proved to be a large design challenge.

Accomplishments that we're proud of

Really happy about establishing a robust backend infrastructure capable of handling complex data processing tasks efficiently. Moreover, developing a user-friendly interface that simplifies the process of accessing and navigating summary snapshots.

What we learned

  1. The importance of prioritizing user experience in the design and development process.
  2. The technical intricacies involved in working with multiple APIs and integrating them into a cohesive solution.
  3. Strategies for optimizing computational workflows to balance performance and accuracy.

What's next for Youtube Digest Companion

  1. Continuous optimization of backend processes to improve scalability and efficiency. Integration with other popular content platforms to expand the reach of the solution.
  2. Further refinement of the user interface to enhance usability and accessibility.
  3. Exploration of additional features such as collaborative playlist management and more active personalization.
Share this project:

Updates