Inspiration

When deciding upon which dataset to use, we wanted to do something fun. Since we were allowed to use our dataset, we decided to choose that option and stumbled upon the Rebrickable API. Once we realized how we could use LEGO in the competition, the gears started turning.

What it does

Determines which LEGO sets are "supersets," or sets that, with 90% similarity, can create other sets. We can also check out the trend of supersets over the years and the number of parts within each superset.

How we built it

We used the Rebrickable API to create our dataset. Although Rebrickable also contained their dataset, it was not sufficient enough for our use due to the lack of relation between sets and their parts. Unlike many who used existing datasets, we needed to create our dataset to obtain more information. After gaining the information needed, we used cosine similarity to figure out which sets can make other sets. Finally, we graphed our results using Matplotlib.

Challenges we ran into

One of the most notorious challenges that we had to face was gathering and compiling our data. Since we weren't using an existing dataset, we had to essentially create it ourselves. However, it took approximately 6 hours to create this dataset due to our lack of horsepower in technology. Another problem was formatting our data, as sometimes it was not in the correct type. In the end, we would have to explore some new Python libraries to resolve this issue.

Accomplishments that we're proud of

Creating the dataset was one of the most accomplished feelings that our team felt. As stated above, it took many hours to get this dataset, but it also took a long time to debug and test with smaller pieces.

What we learned

We learned that creating a dataset entirely from scratch is an extremely tedious challenge, but it was extremely rewarding when we managed to get the dataset.

What's next for ReBrick

One of the possibilities that we proposed is using multiple sets to create one singular set. For example, we could use 2-3 sets of smaller costs to create one large set of large costs. Another proposal is to create a web application so that users can see what LEGO sets are best to get.

Built With

  • database
  • matplotlib
  • python
  • rebrickable
Share this project:

Updates