Inspiration The project is designed to enhance public discourse by identifying statements that require verification. With misinformation rampant, this tool helps prioritize fact-checking resources efficiently, making it particularly valuable for journalists and educators.
What We Learned We learnt natural language processing (NLP), as we explored various text preprocessing techniques and machine learning models, particularly focusing on neural networks and how they can be applied to classification tasks.
How We Built It We utilized Python and several libraries to develop the classifier:
Data Preprocessing: Cleaned the text data using NLTK for tokenization and lemmatization, removing stopwords to focus on significant words.
Modeling:
Used TensorFlow and Keras to construct a sequential neural network model. Integrated layers such as Embedding, LSTM (Bidirectional), and Dense layers to learn from text data effectively. Applied SpatialDropout1D and Dropout to prevent overfitting. Training and Validation:
Employed Stratified K-Fold Cross-Validation to ensure our model generalizes well on unseen data. Used Early Stopping during training to halt the training process at the right time to avoid overfitting. Performance Evaluation:
Evaluated the model using the F1 score, which balances the precision and recall of the classifier. Analyzed the results using confusion matrices to understand the true positive and negative rates. Challenges Faced Data Imbalance: Managing skewed datasets where 'No' instances far outnumbered 'Yes' was challenging in training balanced models. Overfitting: Designing the neural network to generalize well and not just memorize the training data. Optimization: Selecting the right optimizer and tuning hyperparameters like learning rate and batch size required multiple iterations. Tools Used Languages: Python Libraries: Pandas, NumPy, NLTK, TensorFlow (Keras), Scikit-Learn Techniques: LSTM Networks, Text Tokenization, Embedding, Cross-Validation
Built With
- lstm
- nltk
- numpy
- python
- scikit-learn
- tensorflow
Log in or sign up for Devpost to join the conversation.