Creating the Yebr Music Platform
Yebr is a free music streaming and search service which aims to promote free and open-source software services. Most of the music streaming industry was behind closed-source proprietary software, so in an attempt to change that, my team and I built Yebr.
The current description omits details about many services that make Yebr happen, such as the AI-based search engine and the Yebr design engine. More details can be found at research.kabeers.network
Architecture Overview
We planned the software architecture to consist of tens of independent microservices, which would be scalable and tested separately. One of these services is the sourcing engine, which as the name implies, sources music and other audio content from around the web. We started with YouTube, indexing content and making temporary cached copies. We planned to progressively add other sources as our platform grew.
Yebr initially saw little success. Our user base was under 50 people at the time of our first stable release, but these numbers grew rapidly, reaching almost 6x after version 3 introduced music aggregation. I led this feature, studying and picking algorithms that would recognize and process audio from anywhere on the web. I developed multiple algorithms of our own, but for reliability, I stuck with ChromaPrint, an audio recognition algorithm used in the MusicBrainz AcoustID project.
Building the Knowledge Graph
The idea was to slowly build a music index database by extracting music files, artist data, and other features, effectively combining them into a knowledge graph. This helps our users not only stream audio directly but also find it on other platforms, and helps us create the next big thing in the project: the recommendation engine.
Recommendation System Evolution
At the start, Yebr was equipped with a basic TF-IDF matrix-based recommendation engine that used search and watch data such as titles and descriptions to recommend music on the platform. As fast and efficient as it was, we soon realized that the algorithm was not accurate enough, and its recommendations were generally ignored or disliked by users. So I went on to create a new AI-based model.
I first looked into simpler models such as those which used matrix factorization; however, at the rate the rating data and number of users and audios on the platform rose every day, recalculating a matrix model would become infeasible. So I looked towards a deep learning-based approach. I was heavily inspired by YouTube’s 2010 paper about their algorithm and DailyMotion’s Medium blog where they described their deep-learning approach to the problem.
The solution I came up with was a mix of these two approaches. I created a multi-network system that made use of the matrix factorization model for new users and a deep learning model written in TensorFlow which incorporated over 70 different features such as age, region, and language to predict user taste. The network used a dozen hidden layers and a weighted sampled softmax function to rank items. A re-ranking network was added in a later update to further increase the accuracy of the predictions.
Infrastructure and Features
Other parts of the Yebr service were just as interesting. We chose to host these microservices in Docker containers inside a Kubernetes cluster. I wrote the load balancers and domain mapping servers, which helped maintain request load on the servers, and caching servers which prevented extra load from overwhelming origin servers such as YouTube or Spotify.
The domain mapping servers were our approach to SSO (Single Sign-On) and universal login, which worked with our custom OAuth implementation called Kabeer IDP. The system worked like a charm, allowing SSO authentication and load balancing, simplifying the flow for the creation of other features such as Yebr Cast. This feature uses WebSockets and WebRTC to allow device rooms where peers can, in a secure way, send messages and events to the host device such as Yebr running on TV.
Conclusion
Making Yebr was very fun and a great learning experience for all of us. I was responsible for not only writing and testing code but also managing cloud infrastructure, AI and user experience research, managing teams, and coordinating updates. Check it out at music.kabeers.network.