Items covered in the project:
- Harvested Tweets related to Cancer and Breast Cancer
- Text and sentiment analysis
- Dataset used: Breast Cancer Wisconsin (Diagnostic) Data Set. Model trained with this Dataset
- Data acquisition and cleaning
- Identification of best predictor variables
- Construction of a Logistic Regression model to identify probability of a tumor being Malignant
- Testing Model fitness - Pseudo R squared, individual variable fitness - Wald's test, overall model fitness - Hosmer and Lemeshow goodness of fit (GOF) test, test for Multicolinearity - VIF test
Tools used:
- R - dplyr, glmnet, ggplot2, popbio, aod, pscl, survey, caret, ResourceSelection, HH, RCurl, twitteR, wordcloud, tm, syuzhet, SnowballC, rtweet, stringr
Other tools used:
- Markdown
- LaTeX
© 2025 • All content within this project is strictly the property of Aadith Kumar and is not for public use without permission.
Comments