Inspiration
Surfing the web, a mundane task for billions worldwide, is often a nightmare for the visually impaired. Articles with walls of text are seldom accompanied by more than a monotonous screen reader that one couldn't bear to listen to for over a minute. Plenty of screen readers are out there, but we wanted to create one that is a joy to use daily.
What it does
We created an engaging, entertaining, and practical text-to-speech reader that can summarize, prioritize information, and even "skim" the text to answer questions. Our website takes any PDF as an input and extracts all its text. Then, it simultaneously uses GPT-4 to convert the PDF contents into an interesting podcast-like JSON script and also converts each text chunk into vector embeddings. If the user has any questions, it uses a similarity search to find a relevant context chunk and uses GPT-4 to answer the question. Finally, it uses the Elevenlabs API to convert all the text and answers to audio to maximize accessibility for the visually impaired!
How we built it
We mainly utilized Python for the back end. For the front end, we used HTML, CSS, and JS with Flask to integrate with the Python backend. We used the OpenAI API for embeddings and GPT-4 and we used the Elevenlabs API for text-to-speech conversion.
Challenges we ran into
While both of us are experienced with backend programming in Python, we had virtually no experience with frontend development and its related tools like HTML, CSS, and Flask. This posed a significant challenge to us as we tried to build an accessible, elegant front end. However, we persevered for several hours and dozens of roadblocks and ended up having a functional front end to present our product!
Accomplishments that we're proud of
We're very proud of our integration of many varied APIs to accomplish our goal and also our integration of Python to the HTML front end with Flask. This project required us to work with many complex and different tools and bring them together seamlessly.
What we learned
We gained valuable front-end experience for future hackathons and also expanded our toolset as this was our first time working with Elevenlabs too! The new tools and libraries we worked with during this hackathon will undoubtedly help us achieve more innovative and difficult projects in the future.
What's next for HearIt
A podcast is not the only format through which we can make PDFs entertaining. The technologies we implemented for HearIt are extremely versatile and we hope to try many other formats in the future like Dramas, Comedy, etc. We also hope to add a more diverse arrangement of speakers.
Built With
- css
- elevenlabs
- embeddings
- flask
- gpt-4
- html
- javascript
- openai
- python
Log in or sign up for Devpost to join the conversation.