Git. Read. Go

The default popup of Git. Read. Go
Loading animation for generating README.md
A code review of the repo-read repository
A sample generated README.md
Detailed information from the code review

Inspiration

Our team focuses extensively on Developer Tools for a wide range of skill sets and applications. We reimagine the way in which GitHub repositories are made and allow for a seamless yet effective automated method of ReadMe generation and code reviews. Our inspiration for the idea came from the process of finding ideas for Mhacks. While searching through GitHub repositories, we found ourselves having a hard time understanding the goals of a repository and the requirements we need to personally run their code. From our research teams to internship groups, we learned that this issue is more prominent than our personal project selection. ReadMe generation, documentation of code files, and quality reviews on code for industry and personal development issues is a significant time cost that many developers would be willing to hand off to an automated system. Specifically, we settled on a developer tool as a result of our literature review on Gemini 1.5. We found that the largest differentiating factor between Gemini 1.5 and existing Large Language Models stems from larger token limits, specifically in code readability. Gemini’s ability to read up to 30,000 lines of code is a unique feature that settled us on a new developer tool to analyze raw code files and create beautiful, standardized, and practical ReadMe’s and Code Reviews.

What it does

Our solution is developed and designed to interact with users through a Google Chrome extension. We found that this method would be the most seamless way to allow users to be on any GitHub repository page and generate a ReadMe or code review without inputting any information. From this extension, the user can receive 2 main pieces of multi-modal information.

Let’s start with the ReadME: the user will have a fully-automated ReadME.md document ready for them set with a summary and explanation of usage, all of the features implemented in the code, prerequisites for libraries and packages to run the project, a summary of deployment, and a section for licenses and acknowledgments.

Next, the code review: we generate a code review based on the following metrics evaluated through Gemini: Functionality, Correctness, Code Quality/Style, Performance, Maintainability, and Usability and provide the user a numerical score for each of those metrics, an overall score, 2-4 main areas of strengths and weaknesses, and a full report with an in-depth analysis of the code files, similar to an industry level code review.

How we built it

Our interface consists of two parts. We will start with the ReadMe generation. On the backend, we start with utilizing the GitHub API to fetch the repository user and name. From here, we developed a recursive git tree mapping functionality to receive the content of all code files, images, videos, audio files, or any other miscellaneous files (including existing ReadMe’s!). We also fetch the repositories’s commit history, and git tree structure to allow for Gemini to learn the structure of files and their relation to each other in the repository.

From the information, we pre-train our Gemini model on 300+ pre-existing ReadME files and codebases. Then we process our Gemini 1.5 calls for either ReadME generation or code review generation, and with our generated prompt, we utilize our Flask server to respond to Post Request from our front end sending back the specific file generated.

Lastly, our front-end developers created a React.js User Interface with typescript to allow for creating ReadMEs and generating full code reviews. This is also where we recognize Github as our website and automatically fetch our necessary information including the username of the repo's creator and the repo name.

Challenges we ran into

One of the main challenges in the project was the Github API token limitations on rate limits and accessibility from multiple users. We foresee a deployed version of the product to have users use their own API token.

Another challenge was learning the strengths and weaknesses of such a new Large Language Model that we hadn’t previously interacted with. Prompt engineering was a significant part of our time usage and allowed us to learn and develop effective prompts.

Developing/Finding a dataset of ReadMEs was another challenge. Since the project is quite novel, we found that existing datasets did not suffice for our product, so we ended up using our development for the API fetches to create our own custom dataset of ReadMEs.

What we learned

We all became near experts in our respective fields during this project. Krish was our token-generating wizard for Github, and tree recursion extraordinaire. Rith and Jake developed the fully functional and lag-free interface with their React.js skills. Aryaman experimented with prompt engineering and Gemini, and utilized Flask to interact with our front-end server. We all learned to work together with our respective strengths to develop the fully functional Chrome extension we ended with.

What's next for Git. Read. Go

For a fully deployed Chrome extension, we expect GitReadGo to be able to allow users to input their own API token for the GitHub API which would allow them to access data from all public or privately accessible repositories.

Since finding and developing datasets is a concern for us now, we hope to generate a new and larger dataset to pre-train Gemini 1.5 and experiment with the temperature functionality to test results for file generation.

We would also like to have developer tools for our code reviews to work in real-time in IDE’s. As well as allowing for more extensive querying of codebases in a chat-like feature.