Preprocessing
In preprocessing two datasets—consumer search data and booking inquiries—a series of steps were taken to enhance usability and facilitate analysis. This included converting descriptive data into a structured format, particularly transforming time data into datetime values for temporal analysis. Additionally, unique substrings were extracted from messy lists, and coordinate data was obtained from neighborhood filters using specialized tools. To streamline analysis, entries associated with the same host ID were consolidated into "host profiles" with host scores calculated from the weighted average of booking days and unique customers. Utilizing the presence of 'null' values helped discern successful booking completions, which proved invaluable for predictive modeling endeavors. Overall, this comprehensive preprocessing approach ensured that the datasets were optimized for subsequent analysis and modeling tasks.
Preliminary Analysis
Before diving into the analysis, we assumed the role of an Airbnb city manager, adopting a proactive stance to understand the intricacies of Dublin's short-term rental landscape. Through extensive research, we gained insights into the city's dynamics, Airbnb trends, and broader social contexts, allowing us to contextualize the datasets effectively. Armed with this knowledge, we meticulously scrutinized the preprocessed data, identifying areas for improvement and potential biases. This critical assessment propelled us to delve deeper into key factors such as location, booking size, and response times, aiming to decipher their impact on host quality. By actively exploring these variables, we aimed to uncover meaningful connections and insights that could inform strategic decisions for managing Airbnb operations in Dublin. This proactive approach not only enriched our understanding of the data but also positioned us to develop informed recommendations to enhance the Airbnb experience for both hosts and future hosts in the city.
Testing Hypothesis
In our comprehensive testing of our hypotheses, we thoroughly examined the Dublin Airbnb market. Using a mix of descriptive and inferential statistical methods, we meticulously analyzed the datasets to validate or challenge our initial conjectures. Through the interpretation of heat maps illustrating search queries across neighborhoods and bar graphs comparing search versus booking sizes, we unearthed compelling evidence supporting our hypotheses. The disparity between the prevalent search for single-person accommodations and the common booking size of two guests hinted at a potential shortage of smaller Airbnb listings. Furthermore, our investigation revealed a tangible link between slower response times and diminished booking rates and occupancy levels, highlighting the critical importance of prompt communication in securing successful reservations. These insights not only provided a deeper understanding of the dynamics within the Dublin Airbnb market but also empowered hosts and city managers with actionable intelligence to optimize their offerings and enhance the overall guest experience.
Recommendations
Based on our analysis, we recommend several strategies to enhance the Airbnb experience in Dublin. Firstly, increasing the availability of rentals near the city center could help meet the demand for centrally located accommodations. Secondly, catering to the preference for smaller bookings by offering accommodations for fewer guests could address the apparent shortage of smaller listings. Lastly, prioritizing prompt response times to guest inquiries can significantly improve booking and occupancy rates. Implementing these recommendations can not only optimize the Airbnb landscape in Dublin but also contribute to a more satisfying experience for both hosts and guests.
Predictive Model
We developed a predictive model aimed at determining the likelihood of a query converting into a booking based on four key features: minimum and maximum filter price, number of guests, and number of nights stayed. Employing a majority vote ensemble approach integrating XGBoost, neural network, and random forest models, we achieved convergence at approximately 87% accuracy, indicative of strong predictive performance. To mitigate overfitting concerns encountered with the neural network model, we implemented cross-validation techniques and ultimately opted for a majority vote mechanism, which effectively countered this issue. This ensemble approach not only yielded robust predictive capabilities but also provided valuable insights into the interplay of different modeling techniques in the Airbnb booking context, contributing to our collective learning and refinement of predictive methodologies. We hope our predictive model holds promise for broader application across various cities due to its generalized nature.
Challenges
Throughout our analysis, we encountered several challenges that required innovative solutions and adaptability. The first challenge was understanding that despite having neighborhood search data we had no way of finding the locations of the Airbnb listings. We spent a bit of time trying to pull connections together on that. Another hurdle was mastering the visualization techniques necessary for creating heat maps, which initially posed a learning curve but ultimately provided invaluable insights. Additionally, optimizing the neural network model to effectively process and interpret the data proved to be a complex task, requiring experimentation and fine-tuning to achieve satisfactory results. Furthermore, during the preprocessing stage, striking a balance between data cleaning and retaining meaningful information was essential to avoid discarding too many data points. Moreover, navigating the limitations of available data posed a challenge in finding compelling claims to support our analyses. Despite these obstacles, our team persevered, leveraging creative problem-solving and collaboration to overcome challenges and derive meaningful insights from the dataset.
Accomplishments
Our preprocessing efforts streamlined the datasets for analysis, transforming descriptive data into structured formats, extracting relevant features, and consolidating host profiles. This was very rewarding toward the latter end of our workflow as we ensured that the data was optimized for subsequent modeling tasks. We were able to provide quantitative and visual evidence to contextualize our analysis within Dublin's short-term rental landscape, conducting extensive research to identify key trends and social contexts. Hypothesis testing enabled us to reveal insights into the correlation between search preferences, booking sizes, and host responsiveness. Furthermore, our predictive model, which integrates XGBoost, a neural network, and random forest algorithms via majority voting achieved an impressive accuracy of approximately 87%, offering a robust framework for predicting booking conversions. Despite challenges such as data limitations and model optimization, our team overcame persisting obstacles to enhance the Airbnb experience in post-2014 Dublin. Although the data explored may seem outdated, the techniques and implications are generalizable in many cases throughout the world, even today.
What We Learned
Extensive exploration of these datasets exposed us to several challenges that forced us to learn about obstacles we are sure to encounter in our futures. Namely, we learned the importance and rewarding outcome of patiently and diligently preprocessing our data before making any assumptions or false connections that may lead us down a long line of endless insignificant calculations. The role we were assigned, something we wouldn't have thought to give ourselves, opened us up to learning how to contextualize data within a broader social and geographical context, which, in this case, was valuable in providing insights into user preferences and market trends. Learning how to utilize and correct our predictive modeling techniques with ensemble methods and cross validation gave us technical analytical experience useful for helping us diagnose technical issues in the future. Overall, our experience has deepened our expertise in data analysis, visualization, and predictive modeling, equipping us with valuable skills for future research and success in our respective fields.
References
https://platform.stratascratch.com/data-projects/market-analysis-dublin https://www.irishtimes.com/news/ireland/irish-news/airbnb-hosts-facing-retrospective-tax-bills-for-2014-1.2312416 https://www.reuters.com/world/europe/irish-tourism-begins-long-road-recovery-st-patricks-day-2022-03-17/
Log in or sign up for Devpost to join the conversation.