Inspiration
Today, it is very difficult to practice your pitch, we made a conversational AI bot that helps you practice your pitch and receive immediate feedback (with tone and emotional feedback).
Additionally, it is commonplace for a business operator like an apartment to have lots of media (like photos, pdfs, videos) stored in separate places. Its often very hard to find the relevant information at the right time.
We make it possible for individuals to fully utilize their content library to find relevant media for their business, by creating a multimodal search experience.
What it does
Pitcher is a web application that allows a sales agent to prompt the Gemini API to retrieve relevant information about a specific property, such as videos, photos, and floor plans.
The application uses Python as the backend to preprocess the video content, including extracting audio and generating transcripts using Deepgram AI. It also utilizes Gemini to extract key frames from the videos and describe them using natural language.
In addition to video content, Pitcher also provides access to catalog like data for floor plan images and pricing information, which are retrieved using Python-based data parsing. The frontend of the application is built using TypeScript, presenting a conversational-style interface where the sales agent can prompt the system with questions, and the interface will respond by displaying relevant videos, images, floor plans, and pricing details.
Pitcher also includes an audio call feature, where the agent can call the client directly. When the call ends, the transcript is sent to a database (Supabase) and used to generate relevant videos and/or images to the client.
How we built it
Our team used various libraries and tools to achieve the desired functionality, such as the Gemini API for video and image processing, Deepgram AI for speech-to-text transcription, and Supabase for the database integration.
Challenges we ran into
One challenge that we ran into was connecting the audio system to our frontend. Especially after the call ends and the transcript is sent to supabase, we had to find a way to call Gemini using the transcript and to use our corpus as context.
Another challenge that we ran into revolved around the fact that Gemini Tool could not return more than one function. Since we wanted to display not only just one video or text, we ended up using Gemini to return text and JSON to manually call our functions.
Accomplishments that we're proud of
The main accomplishments highlighted in the description are the integration of various data sources (videos, photos, floor plans, pricing) into a single platform, and the creation of an interactive and conversational-style interface for the real estate agent and client. The ability to preprocess the video content, extract audio and transcripts, and link them to the property details is an impressive technical feat.
Additionally, the inclusion of the audio call feature, where the transcript is automatically saved and displayed within the application, demonstrates the team's effort to create a comprehensive and seamless experience for the real estate agents and their clients.
What we learned
Through the development of Pitcher, the team gained experience in the following areas:
- Integrating multiple data sources and APIs (Gemini, Deepgram AI, Supabase) to create a cohesive application.
- Implementing video and audio processing pipelines, including techniques like speech-to-text transcription.
- Designing a conversational-style user interface that seamlessly displays relevant property information.
- Navigating the challenges of real-time data synchronization and updating the interface based on user interactions.
- Leveraging Python and TypeScript technologies to build a full-stack application.
What's next for Pitcher
The next steps for Pitcher, as outlined in the description, include the following:
Implementing real-time call transcription to display videos, images, and other relevant content during the client-agent conversation. This would further enhance the interactive and synchronous nature of the application.
Integrating Google Maps API to display nearby amenities, such as restaurants, gyms, or other points of interest that the client might be interested in. This would provide an even more comprehensive property showcase experience.
Exploring ways to improve the scalability and performance of the application as the number of properties and data sources grows. This could involve optimizing the data retrieval and processing workflows, as well as implementing caching or other performance-enhancing techniques.
Built With
- fastapi
- gemini
- nextjs
- python
- render
- supabase
- twilio
- typescript
- vapi
Log in or sign up for Devpost to join the conversation.