Inspiration
There is a Korean song called ‘Dinosaurs’. Although it talks about a literal dinosaur, it also discusses that it wants to have a free soul like a dinosaur. We were inspired by this song to investigate how the word ‘dinosaur’ is used in songs in different contexts.
What it does
Our algorithm takes in a string and returns if the string is closer to being a dinosaur, an old person, or something else.
How we built it
We built it using data structures, training the algorithm with our training data, and then used it to predict an input string.
Challenges we ran into
We were challenged in the processes of working with the transformers package and working with the prebuilt NLP models. We struggled a lot with the dataset structure from the datasets package, and their compatibility with the pandas dataframe() package. We did not really understand what a dataset was, and how it really worked. The guide and docs were pretty sparse and the inputs and output structure weren’t really clear or explicit to us. We were working through the NLP and transformers package for the first time, as first years, so we were working through many challenges. Most of the packages were new to us, so we were unfamiliar with what exact function does what as well as its compatibility with other data types. We collected and labeled data ourselves, and this was fairly time consuming. We were initially going to be using TFIDF, but we wanted to have a higher goal in mind using NLP and other methods such as sentimental analysis. We had bigger problems here in that we were chasing a larger goal than what we first had, and this was challenging in that we all faced different new obstacles. We could not figure out the HTML backend and its compatibility with python.
Accomplishments that we're proud of
We were able to learn webscraping using an API for the first time using the requests package in python. We were also able to deploy a webpage using HTML and CSS but was not able to link about method to the webpage so that the user can get a response from the system. We were constantly learning for the obstacles that we faced. We were able to have clear plans about what we needed to do next, even if we did not know how to express that in code.
What we learned
We learned that we were very underprepared for this project, and that this would take us much longer than the two day hackathon. We should have been familiar with at least a couple packages such that we would have a solid foundation to approach this project, such as transformers, pytorch, and other ML methods.
What's next for Dinosaur Language Analysis
The next for this project would be learning the fundamentals about NLP and moving forward with the model. We would have to train the model better and deploy a webpage that successfully works in the backend.
Log in or sign up for Devpost to join the conversation.