We're so focused on text-based content recommendation systems it's easy to forget about other types of data, such as audio. Looking at such systems takes us into a world of what are to me new concepts and entities. This paper (16 page PDF) looks at audio similarity based on "mel-frequency cepstral coefficient features" (read about this in Wikipedia here). These audio features are used by speech recognition engines; they are combined to build up words and phrases. The recommendation system described here then adds other, more familiar, elements, such as language, scene, genre and mood to fine-tune the categorization. A discussion for the future will be to ask whether feature-based audio recognition and recommendation will be superseded by more general transformer neural network algorithms, which I talked about here.
Today: 7 Total: 95 [Share]
] [