VisuAIize: Contextualizing the World for 338 Million People
Introduction
Welcome to VisuAIize, an app designed to enhance the daily mobility of the blind and visually impaired. With over 338 million people worldwide experiencing some form of visual impairment, our team recognized the need for a solution that provides real-time feedback and assistance in navigating their surroundings.
Inspiration
Our inspiration for VisuAIize stemmed from our personal experiences with family members who are blind or visually impaired. Witnessing the challenges they face in navigating their daily lives ignited our passion to develop a solution that could empower them with greater independence and mobility.
What We Learned
Throughout the development process, we gained valuable insights and skills in several key areas:
Multithreading API: We learned how to effectively utilize multithreading to parallelize tasks within our app, such as video capture and query processing. This allowed us to optimize performance and reduce latency, enhancing the overall user experience.
Segmenting Videos: Segmenting videos for analysis presented its own set of challenges, but through experimentation and research, we acquired the knowledge necessary to efficiently process and analyze video data in real-time. This involved understanding how to capture and upload frames at a rapid pace while ensuring appropriate resolution and segmentation for analysis.
Dealing with Multi Modal Large Language Models (LLMs): Integrating and working with large language models like Gemini 1.5 required a deep understanding of prompt engineering and optimization techniques. We learned how to tailor prompts to elicit relevant and concise responses, as well as how to optimize queries to minimize redundancy and improve response times. This experience provided us with valuable insights into working with advanced AI models and leveraging their capabilities to enhance our app's functionality.
Building VisuAIize
VisuAIize utilizes Gemini 1.5, a powerful AI model, to provide live feedback of one's surroundings through video analysis. Here's how we built our project:
User Interface: We designed a simple and intuitive UI, accessible with only two buttons, to ensure ease of navigation for blind and visually impaired users.
Video Analysis with Gemini 1.5: We leveraged Gemini's video input feature to capture and analyze real-time footage of the user's environment. By capturing 2 frames per second and uploading them to the Gemini cloud, we minimized latency and optimized processing speed.
Optimizations: To further enhance performance, we downscaled the captured frames and utilized threading to parallelize video capture and query processing. We also engineered prompts for Gemini 1.5 to provide concise and relevant descriptions of the surroundings, focusing on obstacles and people.
Text-to-Speech Integration: Google's Text-to-Speech API was integrated to read out the response from Gemini, ensuring seamless accessibility for users.
Challenges Faced
While developing VisuAIize, we encountered several challenges, including optimizing latency, refining prompt engineering for accurate descriptions and avoiding redundancy, and ensuring compatibility across different APIs and frameworks.
Future Developments
For the future, we have ambitious plans to further enhance VisuAIize:
- Android App Development: We aim to develop an Android version of the app to broaden accessibility for all visually impaired smartphone users.
- Family Connectivity Features: We plan to incorporate features that allow users' families to connect with the app, including user location history and personalized voice outputs.
- Optimization and Prompt Refinement: Continuous optimization and prompt refinement will be prioritized to improve response times and description accuracy.
Conclusion
VisuAIize represents our commitment to leveraging technology for social good, providing invaluable support and empowerment to the blind and visually impaired community. With ongoing development and enhancements, we envision VisuAIize as a transformative tool for enhancing mobility and independence for millions worldwide.
Log in or sign up for Devpost to join the conversation.