Research

Explore our latest research and insights

Using Machine Learning to cluster news headlines

With over 0.25 billion web pages hosted in the World Wide Web, it is virtually impossible to navigate through the Internet.

Next Word Prediction

Using GPT-2 for language modeling and next word prediction with Hugging Face and TensorFlow, including implementation details and a library for next word prediction.

gpt-2 nlp transformer next word prediction

TF-IDF and how the BM25 search algorithim works

Okapi BM25 is one of the strongest “simple” scoring functions, and has proven a useful baseline for experiments and feature for ranking

TF-IDF BEAM25 algorithim text-analysis

Neural Topic Modeling with Continual Lifelong Learning

Lifelong learning has recently attracted attention in building machine learning systems that continually accumulate and transfer knowledge to help future learning. Unsupervised topic modeling has been popularly used to discover topics from document collections

topic-modeling clustering text-analysis

Teach Me To Dance- An open source tool to bring SVG characters to life in the browser via motion capture

An open source tool that uses TensorFlow.js models for real-time human perception in the browser to animate SVG characters via motion capture.

machine learning svg animation motion capture tensorflow.js

[BETA] Creating an Audio Based Indexing System in Yebr

A complex mix of audio and AI that will help Yebr recognise tracks found anywhere on the web

BETA audio-fingerprinting audio analysis indexing

Chats White Paper

Our story on creating Kabeer's Chats, the free chat messenger service. Contains architecture descriptions and more.

Messenger Chats White Paper

PaLM: Scaling Language Modeling with Pathways

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application

BETA

Dremel: Interactive Analysis of Web-Scale Datasets

Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system.

BETA

How does Chromaprint work?

An explanation of how the Chromaprint algorithm works for audio fingerprinting, with details about spectrograms, chroma features, and the process of generating audio fingerprints.

chromaprint audio fingerprinting algorithm

Markdown and XSS

An explanation of XSS security concerns when using Markdown, with examples of how to properly filter content for security after Markdown processing.

markdown security xss web development

Object aware SLAM based on efficient quadric initalization and joint data association

Semantic simultaneous localization and mapping (SLAM) is a popular technology enabling indoor mobile robots to sufficiently perceive and interact with the environment

object-recognition

Writing a Base85 encoder/decoder written in pure javascript.

A guide to creating a Base85 encoder/decoder in JavaScript, with implementation details and code examples.

javascript encoding base85 ascii85

Creating the Yebr Music Platform

Yebr is a free music streaming and search service which aims to promote free and open-source software services

Yebr Beta

Using Machine Learning to Teach People How to Dance

This paper presents a novel approach for real-time dance performance evaluation that emphasizes robust pose matching despite temporal and spatial challenges.

Architecture Paper MoveNet Machine Learning DTW

How to Build a Scheduler

A guide on building a scheduler with function pointers, task lists, and state management for real-time systems.

scheduler embedded systems function pointers task management

The Face-Name Matching Effect: When Appearance Predicts Identity

Exploring the cognitive phenomenon that makes us perceive certain faces as matching certain names better than others, and what research reveals about this curious intersection of perception and expectation.

psychology cognitive-science research perception exclusive

Reflection paper based on The World’s Fastest Scrabble Program

When I first encountered the Scrabble Paper by Appel & Jacobson I was very intrigued and excited by the promises it held in the world of scrabble solving

Scrabble AI Reflection Paper

Kabeer Identity Platform [ALPHA]

ALPHA paper, for Kabeer Identity Platform, Service description and architecture research paper

Authentication Identity Platform Security

Service Login Flow - Alpha Spec

Internal documentation for the service login flow in Kabeer IDP, describing the process for first-party backend authentication.

authentication service-login alpha specification

Service Logins and Service Tokens on First-Party Apps

As our services have increased in complexity Authorization and Authentication is not simple and easy anymore.

archive authentication service-login spec

Revokeable JWT's with Proof of Possession and counters

In traditional token exchange the resource server has no way to confirm if the token is stolen and belongs to the client. This system introduces a way to incorporate proof of possession into tokens using public key cryptography

Revoking JWT Identity Security

Inviting everyone to help us create the Mesh Project

Mesh is an open-source, decentralized file storage system designed to be self-healing, secure, and cost-effective.

networking storage blob mesh

Could music fingerprinting be copyright infringement?

An analysis of the copyright implications of music fingerprinting technology, discussing patents, derivative works, and legal considerations.

legal fingerprinting copyright audio

LUCID: A Practical, Lightweight Deep Learning Solution for DDoS Attack Detection

LUCID (Lightweight, Usable CNN in DDoS Detection) is a lightweight Deep Learning-based DDoS detection framework that leverages Convolutional Neural Networks (CNNs) to detect DDoS attacks with low processing overhead.

networking security deep learning ddos cnn

Running powerful developer environments inside a browser

Spinning up IDE's with a integrated runtime inside a browser window, a hybrid approch to a editor with MicroVMs and WASI based embedded operating system.

Architecture Paper Cloud IDE Kabeer Cloud Platform

A Review of Active Noise Control Algorithms Towards a User-Implementable Aftermarket ANC System

A survey of various active noise control algorithms with an eye towards eventual application in a user-implementable aftermarket ANC system on off-the shelf hardware.

noise control algorithms ANC

Oscillator, Filter, Amplifier, Envelope, and LFO

An explanation of synthesizer components including oscillators, filters, amplifiers, envelopes, and LFOs, with descriptions of different waveforms and parameters.

synthesis audio oscillators filters

Shepard tone explained

An explanation of the Shepard tone, an auditory illusion that creates the perception of a tone that continually ascends or descends in pitch yet gets no higher or lower.

audio shepard tone auditory illusion