Slider - A Gemini-Powered Presentation Tool

(Table 76)

Inspiration

Everyone hates making powerpoint presentations. Imagine you could just conjure up a fully fledged slide deck for your use case with just a few words or images. Introducing Slider, a Gemini-based Presentation Tool

What it does

A user, looking for a google slides presentation, navigates to the application homepage, and enters the title for the presentation. They are then able to specify text, image, or audio prompts and whether they should inform the theme of the presentation, or the topic. We use style transfer to glean information about their inputs and inform the rendering of the generated presentations.

For example, from JUST a picture of Barack Obama, and the prompt "Create a presentation about the president in this photo", it created an informational presentation about the president including his early life, accomplishments, and some fun facts. It was even styled with pictures of the president!

Other use cases include creating presentations about movies, speeches, Wikipedia articles, or even your own projects.

How we built it

Our frontend collects information about the user’s desired style, topics, and other metadata. This is fed into a pipeline that eventually feeds data into Gemini and the Slides API. Gemini uses function-calling to trigger appropriate functions to communicate with the Google Slides API. Slowly but surely, we construct a full-fledged presentation for you!

The frontend was made with React and our backend is a Flask server.

Challenges we ran into

Often our testing was slowed due to API rate limits and timeouts. The data for an entire slide deck consisted of several thousand tokens, which was too much for Gemini to return in one connection. We ran into frequent 504s and 429s.

Accomplishments that we're proud of

We can successfully create presentations about any topic by including any combination of images, audio, or text to direct the content of the presentation. We’re also able to ask Slider to make edits to the generated presentation (for instance – adding a watermark, or changing the color of certain slides, or adding more detailed text).

What we learned

Gemini was fantastic at ingesting documentation and being able to return correctly formatted inputs for APIs. We found that it was really good at following specific instructions (for instance, outputting only JSON). Gemini was also able to glean information about the user’s desired styling and accurately contextualize it to a second model that would inform our presentation’s stylistic features.

What's next for Slider

We think our product could elevate the quality of content generated if we had access to a large corpus of google slides in their API format, with good quality prompts that describe them. Slider doesn’t currently understand human-desirable image formatting or text layouts, so with lots of training data it will learn to use font sizes that don’t run over other text blocks, put images into aesthetically pleasing places, and select fonts and colors that align with the theme.