Crunch Conference -- Weak Supervision Workshop

I had the pleasure of running a workshop on Weak Supervision at Crunch Conference in Budapest on 5/10/2022. Let me share a summary of the workshop here:

With simple and efficient out-of-the-box machine learning APIs finetuning and deploying machine learning models has never been easier. For many companies the larger challenge is understanding the goal posts of machine learning projects and the lack of labelled data. Weak supervision can help:

  • labelling data more efficiently
  • finetuning your models on noisy labelled data.

Weak supervision

The workshop used skweak a spacy based weak supervision library to demonstrate how to use labelling functions to generate noisy labelled data. Here’s an example skweak labelling functions:

from skweak.base import SpanAggregator

class MoneyDetector(SpanAggregator):
    def __init__(self):
        super(MoneyDetector, self).__init__("money_detector")

    def find_spans(self, doc):
        for tok in doc[1:]:
            if tok.text[0].isdigit() and tok.nbor(-1).is_currency:
                yield tok.i-1, tok.i+1, "MONEY"

money_detector = MoneyDetector()

This labelling function extracts any digits that are preceded by a currency.

Money detector

skweak allows you to combine multiple labelling functions using spacy attributes or other methods.

Using labelling functions has a number of advantages:

  1. 💪 larger coverage, a single labelling function can cover many samples
  2. 🤓 involving experts, domain expert annotation is expensive, domain expert labelling functions are more economical due to coverage
  3. 🌬️ adopting to changing domains, labelling functions and data assets can be adapted to changing domains

Workshop Slides

Example Kaggle Notebook applying skweak to smoker not smoker detection you will need to verify your phone number and set the Internet connection setting on Kaggle to run the notebook

skweak documentation

Subscribe to my newsletter

* indicates required