8/16/2023 0 Comments Top hype songs from 20142) We also set a “cut-off” for document frequency of 0.1. You can read more about why you might want to do sublinear scaling here. 1) We used sublinear scaling on the term frequencies, giving us a little more variation across our lists. We made two slight modifications to the traditional formula. The words with the ten highest tf-idf scores for each artist were deemed the words “most unique” to him or her. For a given word, we count the number of times it occurs in one rapper’s catalogue (its term frequency) and divide by the number of artists that use it across the hip-hop corpus (its document frequency). Each rapper gets assigned a tf-idf score for every word in the hip-hop corpus. TF-IDF: to determine the words that characterize each hip-hop artist, we used a technique called term frequency-inverse document frequency (tf-idf). other genres, but was only used 116 times in 26 million words. For example “lowrider” had a 255:1 ratio in hip hop vs. These all had fewer than 1,000 occurances in the hip hop corpus. the general corpus, were still rather rare words. Some words were filtered from this list that, while indexing high in hip hop vs. We then compare that to the same math for the general corpus. For example, this is # of appearences in hip hop corpus divided by total words in hip hop corpus. Most Hip Hop: To find the words most “characteristic” of hip-hop, we computed the odds that a word appeared in the hip hop corpus vs. This included efforts to standardize spelling, remove capitalization, and apply light lemmatization. We filtered hip-hop artists by cross-referencing their primary genre on MusixMatch.įor consistency, The hip hop data was cleaned using the same script as the LyricFind corpus. ![]() ![]() The general music corpus was formed using data from LyricFind.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |