Musing about why spacy vocabulary size improves NER

I was reading the scispacy paper and came across the below performance of the small and large model on a specific biomedical task:

The only difference between the models is:

Larger model has a larger vocabulary than small model
Larger model has word vectors

Previously I thought spacy NER does not use word vectors but when they are available they do use them

I was wondering how spacy vocabulary size could impact NER performance:

My naive answer would be that the more words are out of vocabulary the harder it is to learn an entity. So if you have a cancer type classification task and all anatomy parts are out of your vocabulary it will be harder to distinguish them for the model. Why do you think vocabulary size impacts NER performance?

bionlp

Originally written: 04 Oct 2020

Subscribe to my newsletter

Related Articles

Nuremberg Toastmasters Club Officer Guide 28 Jul 2025

Fluent Forever Notes - 1 05 Feb 2025

Connecting 2 docker compose Files with a Docker Network Bridge 21 Jan 2025