Inspiration
4 years ago, Dylan's grandma was unfortunately diagnosed with HER2 positive breast cancer. She thankfully has completely recovered, but the side effects of her ordeal still linger. Every month or so, she undergoes blood labs as part of her post-recovery monitoring.
As someone who does not frequently use the internet, she realized that the process of booking appointments and checking lab results was impossible by herself. As a result, she relied on Dylan to do these things and navigate the internet for her.
That got us thinking: how can we simplify the internet for people like her?
We were inspired by Google Next 2024 when Vercel unleashed a thrilling integration with Gemini applied to the concept of generative UI. While their demo at its core was static and prebuilt, we wanted to build something dynamic; something that pulled from real websites to generate al interfaces.
What it does
GemUI understands the users' requests and generates only the relevant UI for them. This includes text, buttons, and forms.
The web and its interfaces can be confusing. In a sense, GemUI basically embodies this meme:
GemUI allows anyone to generate universal UI at any website to help in navigation.
Here's an example: going to Quest Diagnostics and figuring out your options.
becomes
Quest Diagnostics has many of its buttons and features scattered across the webpage.
How we built it
While crafting the concept behind GemUI, we let several tenants guide our process.
Innovation and Complexity
Gemini 1.5 is stunning, especially in its ability in two key areas: near-perfect needle recall in a haystack and the massively expanded context window. GemUI not only can capture the entirety of the webpage's HTML, but it can receive images of what the user sees. Gemini 1.5 is perfect here, and we have demonstrated that Gemini 1.5 is more useful than simply a tool meant to only answer questions about huge documents.
The chatbot-browser integration was achievable only though real-time websocket communication between the client and the server. Chatbot UI events are piped to a live instance of Selenium. Gemini orchestrates the generation of meaningful and relevant UI by translating browser code into a generative UI that our frontend interface can render consistently. Interactions on our generative UI directly translate to actions on the real web UI.
To bring GemUI to life, we utilized Google Gemini 1.5's Python SDK, FastAPI, Websockets, and Selenium as our backend, and Next.js, TypeScript, ShadCN, Tailwind, Whisper, and the Vercel AI SDK in the Frontend.
Usability and Community Impact
We aimed to design an experience that simplifies website navigation while still leaving control to our users. With GemUI, anyone, regardless of their background as an internet and technology user can benefit. GemUI is meant to be a tool that anyone can pick up. Speeding up website navigation is nice, but we specifically wanted to highlight those who struggle to navigate the web. People like the elderly, who are confused by all the fancy interfaces, designs and complicated steps attached to modern websites, get the UI stripped and directly delivered by them. Lastly, children who are just learning Internet-literacy can still meaningfully navigate the internet with GemUI.
We also wanted to prioritize accessibility to those who don't have issues navigating the web, but looking at it. GemUI enables users to personalize their web experience to fit their unique accessibility needs, such as increasing GemUI font-size and display size.
What's next for GemUI
We want to remove typing altogether by incorporating direct audio interaction with GemUI. This removes another barrier to GemUI: written literacy. Now, those who primarily communicate verbally (in particular, children who can't easily read and write) can use GemUI, Right now, we have the ability to transcribe live audio using Whisper, but it still undergoes the intermediate step of being transcribed manually before sending the request to Gemini 1.5.
Built With
- fastapi
- gemini
- nextjs
- python
- react
- typescript
- websockets
- whisper
Log in or sign up for Devpost to join the conversation.