Gemini Lens

Inspiration

Travel is a big part of our lives. One of the most important components of travel is sightseeing, through which we are able to learn about new culture and history, expanding our horizons in the process. However, the current tour guide industry often does not support a strong sightseeing experience. Issues such as cost, scalability issues, and language barriers can often limit sightseeing from reaching its potential. This current issue inspired us to create Gemini Lens, an application to take a photo, learn more about a location, and receive a tourguide experience.

What it does

Gemini lens serves as a virtual tour guide assistant. When a user is traveling in an unfamiliar location, they can simply open the Gemini Lens app, take and upload a picture, and receive immediate information about the location or landmark. This may include historical information, geographical background, or other information that is unique to that particular location. Additionally, the Gemini Lens app provides information about nearby accommodations, safety in a particular area, and other landmarks / locations nearby to visit.

How we built it

We built Gemini Lens using React Native, Flask, and Gemini AI API. In react native, we developed image capture and analysis functionality. Additionally, we used geolocation functionality in React Native to record a user’s exact coordinates. This information was then sent to the flask backend, where it was processed and sent in a request to the Gemini AI API. Our flask backend included specific requests to the AI that enable us to deliver specific information and suggestions about a location to a user, thus enabling a customized sightseeing and educational experience.

Challenges we ran into

One of the main challenges we ran into was developing a sophisticated location tracking system. In particular, tracking the coordinates of the user and determining a specific location based on this was more challenging than we expected. Additionally, minimizing latency in the application was a challenge, especially since large image files were being processed and analyzed. However, we were proud of our ability to navigate through these challenges and work toward overcoming them, developing a seamless application in the process.

Accomplishments that we're proud of

Gemini Lens offers a smooth and intuitive user experience for all users. Since our target user group consists of many demographics, we made sure to keep the UI as straightforward as possible, while keeping it aesthetically pleasing and effective. In addition, we are proud of our utilization of the Gemini API, offering several features to take advantage of its multi-modal nature.

What we learned

MHacks was an incredible experience for both learning new technologies and building collaboration skills. Having the opportunity to learn directly from Google engineers and working with cutting-edge technology was an incredibly enriching experience. As the first hackathon for most of us, the experience of dedicating this much time to a single project also taught us a lot about collaborative workflow tools, creating quick mockups/prototypes, and more.

What's next for Gemini API

The next steps for Gemini Lens are two-fold. One of our main goals would be to develop the technical side of the application by introducing new features such as video recording, audio playback, and more accurate location detection using more sophisticated coordinate analysis. At the same time, we would be focused on developing Gemini Lens as a product by expanding into other areas of image identification (clothing, food, social interaction, etc). We would hope to leverage the technology we have developed in this application to deliver customized experiences for diverse users in an array of industries, taking our application and Gemini’s AI forward in the process.

Built With

Updates

Connor Yang started this project — Apr 14, 2024 12:26 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.