Inspiration

Our team member, Advaith Narayanan, has told us about how important his childhood experience of spending time with his grandparents was to him. However, he always felt he had lost a part of it, due to his grandfather being deaf and being unable to communicate with him. This is why we were inspired to finally break the barrier between ASL and non-ASL speakers.

What it does

Our ASL Live translator has the ability to take ASL sign input through the webcam and in AR/VR. The user must simply communicate in ASL signs and our program will read and output their signs into text—which can also be translated directly into other languages—allowing non-ASL speakers to be able to understand ASL speakers conveniently. This can be especially useful in a situation where an ASL speaker is in the hospital and cannot write, meaning they must use ASL symbols to communicate. Normally, doctors who don’t understand ASL would not be able to communicate with them, but with the use of our tool, this barrier can finally be broken.

How we built it

Firstly, to build the 2D ASL detection program, we used open-cv, a computer vision library, to detect hand nodes. Then, we created our own data to train an ML model on predicting the ASL letter (we made 100s of jpg images of ASL signs for training). For the main website, we used GSAP, a frontend animation library, and Node JS for the backend. However, our ML models could only run in Python, so we created a Python virtual environment within the Node server for Python script support. We included additional support for text-to-speech and auto-translation.

For the AR/VR ASL detection, we created our own 3D Mesh data for each letter in the English Alphabet and trained an ML model on that data. We used Unity for building the AR/VR framework, but Unity doesn’t support python scripts involving AI/ML libraries. As part of our project, we created our own API endpoint to run python code, allowing us to extract and compile the mesh data. We then created a python server to run our python scripts remotely. We established a TCP socket between the Oculus/Apple Vision Pro (running c# and swift respectively) to our computers to allow for external processing and categorization of the mesh data. Note that basic websockets fail in this case—we could not find a way to install python inside the Oculus Quest 2, and so essentially had to create a Python API to run programs on another device.

Furthermore, voice detection and dictation is done via loading a voice recognition model using HuggingFace and porting it within Unity.

Challenges we ran into

We ran into many challenges throughout the creation of our project. One of the main challenges is getting the socket between the AR device and the computer to work. It took us many hours to get the server and client response communications to work. To get this to work, we tried various things ranging from TCP Hosting to HTTP Servers to UDP clients.

Accomplishments that we're proud of

What we’re most proud of is the use of AI and machine learning to make a model trained on detecting ASL signs. Despite the difficulty of this task, we were able to train it to recognize ASL letters with an impressively high accuracy. Our resulting python API as part of our project also is something impressive that may also come in handy for future projects.

What we learned

One thing we learned was not to spend too much time on smaller features of our code. We also learned to frequently make backups as we had to revert our code back.

What's next

We have begun research into a phrase detector using manual movement epenthesis to hopefully implement something similar in a future hackathon, and are experimenting with different approaches to see which give optimal results. Given more time, we believe that sentence-level detection will be very much possible (with potential stalls as our form of detecing movement epenthesis), and we can add sentence-level capabilities to our project.

Share this project:

Updates