Inspiration
We wanted to build something using the Gemini API that really took advantage of its strengths, namely the 1 million token size and Google's excellent infrastructure. We want use what LLM's are best at, which is being able to automate away a lot of simple tasks. Why not bring this same thing to a live video call instead of just completing an email? We've all been nervous being interviews, dates, or zoom calls. It's very hard to tell what other people might be seeing over zoom, lets make this easier.
What it does
This is a fully integrated Chrome extension which works directly with Google Meets. Just pull up the extension with your video call and Sidekick can give a real time report on how the other person is feeling! It gives a full report afterwards as well, and makes suggestions for your interaction. We want you to be able to nail your interview, date, academic report, whatever.
How we built it
We build this with the Gemini API, a lot of multithreading, and streaming servers. We cut up the stream to generate jpegs, audio snippets, and that allows Gemini to generate real time information/prompts/ about the situation.
Challenges we ran into
Internet connectivity, threading, and screen recording were huge limiting factors. We ensured adequate planning of the structure, but glitches and bugs are a normal part of all hackathon experiences
Accomplishments that we're proud of
The Chrome extension UI is real-time and fully integrated with Google Meet to analyze context and emotions. Itβs a lightweight and instant application that will help boost engagement and promote communication. The emotions are detected with a tailored prompt with intuitive and precise quantifications. To further use this data we have a chat box where participants can interact with for advice from an lite Gemini model.
What we learned
Everything about timing, planning, and life.
What's next for Sidekick
I'm using it myself :))
Log in or sign up for Devpost to join the conversation.