Explore our latest research and insights
With over 0.25 billion web pages hosted in the World Wide Web, it is virtually impossible to navigate through the Internet.
Featured Article
All Articles
Using GPT-2 for language modeling and next word prediction with Hugging Face and TensorFlow, including implementation details and a library for next word prediction.
Okapi BM25 is one of the strongest “simple” scoring functions, and has proven a useful baseline for experiments and feature for ranking
Lifelong learning has recently attracted attention in building machine learning systems that continually accumulate and transfer knowledge to help future learning. Unsupervised topic modeling has been popularly used to discover topics from document collections
An open source tool that uses TensorFlow.js models for real-time human perception in the browser to animate SVG characters via motion capture.
A complex mix of audio and AI that will help Yebr recognise tracks found anywhere on the web
Our story on creating Kabeer's Chats, the free chat messenger service. Contains architecture descriptions and more.
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application
Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system.
An explanation of how the Chromaprint algorithm works for audio fingerprinting, with details about spectrograms, chroma features, and the process of generating audio fingerprints.
An explanation of XSS security concerns when using Markdown, with examples of how to properly filter content for security after Markdown processing.
Semantic simultaneous localization and mapping (SLAM) is a popular technology enabling indoor mobile robots to sufficiently perceive and interact with the environment
A guide to creating a Base85 encoder/decoder in JavaScript, with implementation details and code examples.
Yebr is a free music streaming and search service which aims to promote free and open-source software services
This paper presents a novel approach for real-time dance performance evaluation that emphasizes robust pose matching despite temporal and spatial challenges.
A guide on building a scheduler with function pointers, task lists, and state management for real-time systems.
Exploring the cognitive phenomenon that makes us perceive certain faces as matching certain names better than others, and what research reveals about this curious intersection of perception and expectation.
When I first encountered the Scrabble Paper by Appel & Jacobson I was very intrigued and excited by the promises it held in the world of scrabble solving
ALPHA paper, for Kabeer Identity Platform, Service description and architecture research paper
Internal documentation for the service login flow in Kabeer IDP, describing the process for first-party backend authentication.
As our services have increased in complexity Authorization and Authentication is not simple and easy anymore.
In traditional token exchange the resource server has no way to confirm if the token is stolen and belongs to the client. This system introduces a way to incorporate proof of possession into tokens using public key cryptography
Mesh is an open-source, decentralized file storage system designed to be self-healing, secure, and cost-effective.
An analysis of the copyright implications of music fingerprinting technology, discussing patents, derivative works, and legal considerations.
LUCID (Lightweight, Usable CNN in DDoS Detection) is a lightweight Deep Learning-based DDoS detection framework that leverages Convolutional Neural Networks (CNNs) to detect DDoS attacks with low processing overhead.
Spinning up IDE's with a integrated runtime inside a browser window, a hybrid approch to a editor with MicroVMs and WASI based embedded operating system.
A survey of various active noise control algorithms with an eye towards eventual application in a user-implementable aftermarket ANC system on off-the shelf hardware.
An explanation of synthesizer components including oscillators, filters, amplifiers, envelopes, and LFOs, with descriptions of different waveforms and parameters.
An explanation of the Shepard tone, an auditory illusion that creates the perception of a tone that continually ascends or descends in pitch yet gets no higher or lower.