Song Genre Classification with Machine Learning

favorite

visibility

February 17, 2023 in Computer Science

Summary:
The project aims to classify songs into different genres based on audio features of the song provided by the Spotify API. The program uses a dataset from Kaggle containing 5,750 songs belonging to the top 5 genres by number of songs. The project employed exploratory data analysis to determine the most useful features, including valence, danceability, instrumentalness, liveliness, and loudness. It compared different models to determine the best model for fitting these features and used K-nearest neighbors to determine the genre of each song. Cross-validation was used for hyperparameter tuning, and the F-1 score was used to evaluate the models. Linear regression was used as a benchmark model, and logistic regression, support vector machines, and neural networks were tested. The project's objective was to explore the underlying features that characterize each genre and the criteria that differentiate various genres. The proposed solution for the project was to investigate different models for classifying the songs into the appropriate genre based on its features. The F-1 score was used for comparing models and hyperparameters, and logistic regression, support vector machines, and neural networks were evaluated to see which scores highest given the metrics.

Summary bullet points
• Led a team of 5 students to test various ML models, including kNN, SVM, Logistic Regression, and MLP, to classify the top 5 genres for a dataset of 131,580 songs featuring 20 columns (13 song features) from Spotify's API. Achieved a final F1 score of 74%.
• Conducted Exploratory Data Analysis to reduce the number of features from 13 to 6, resulting in a double increase in accuracy from 30% to 60%. Features removed showed no sign of Gaussian distribution in the data which was a core assumption for most models used.
• Utilized Pandas for data filtering, SciKit Learn for model selection and training, and applied evaluation metrics for optimization.

Details
• Developed a program to classify songs into their correct genre using audio features from the Spotify API
• Conducted exploratory data analysis on the features and determined the most useful ones to include in the model to avoid overfitting
• Used a Kaggle dataset to obtain genres for the songs and compared them across multiple different models to determine the best model for fitting these features
• Found that K-nearest neighbors using PCA and a k of 10 was the most successful model for determining genre

Skills
• Feature selection and exploratory data analysis
• Knowledge of Spotify API and extracting low-level musical features from audio signals
• Model evaluation using cross-validation and F1-score
• Experience with logistic regression, support vector machines, neural networks, and linear regression
• Developed a music genre recognition model using machine learning techniques and Spotify API for feature extraction
• Investigated different models for classifying songs into genres based on their features
• Used cross-validation and F1-score for model evaluation and hyperparameter tuning