Spacy Entities Model

Training spaCy's NER Model to Identify Food Entities

As a side project, I'm building an app that makes nutrition tracking as effortless as having a conversation. To do this, I'll be making use of spaCy for natural language processing (NLP).

In this notebook, we'll be training spaCy to identify FOOD entities from a body of text - a task known as named-entity recognition (NER). If all goes well, we should be able to identify the foods from the following sentences:

  • I decided to have chocolate ice cream as a little treat for myself.
  • I had a hamburger and chips for lunch today.
  • I ordered basmati rice, leaf spinach and cheese from Tesco yesterday.

spaCy has a NER accuracy of 85.85%, so something in that range would be nice for our FOOD entities.

Approach

We'll use the following approach:

  1. Generate sentences with FOOD entities.
  2. Generate sentences with existing spaCy entities to avoid the catastrophic forgetting problem.
  3. Train spaCy NER with the existing entities and the custom FOOD entities. Stir until good enough.

Results

Category
Results
Entities94.14%
Existing Entities71.32%

See the Evaluation and Results section for a full breakdown.

# download spacy language model
!python -m spacy download en_core_web_lg

Collecting en_core_web_lg==2.3.1 Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.3.1/en_core_web_lg-2.3.1.tar.gz (782.7 MB) porarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_data_rate_limit`. Current values: NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec) NotebookApp.rate_limit_window=3.0 (secs) K |████████████████████████████████| 782.7 MB 132.5 MB/s ent already satisfied: spacy<2.4.0,>=2.3.0 in /opt/venv/lib/python3.7/site-packages (from en_core_web_lg==2.3.1) (2.3.2) Requirement already satisfied: blis<0.5.0,>=0.4.0 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (0.4.1) Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (2.0.4) Requirement already satisfied: setuptools in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (50.3.2) Requirement already satisfied: plac<1.2.0,>=0.9.6 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (1.1.3) Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (0.8.0) Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (1.0.0) Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (4.52.0) Requirement already satisfied: numpy>=1.15.0 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (1.19.4) Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (1.0.4) Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (3.0.4) Requirement already satisfied: requests<3.0.0,>=2.13.0 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (2.25.0) Requirement already satisfied: thinc==7.4.1 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (7.4.1) Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /opt/venv/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (1.0.4) Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in /opt/venv/lib/python3.7/site-packages (from catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (2.0.0) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/venv/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (3.0.4) Requirement already satisfied: certifi>=2017.4.17 in /opt/venv/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (2020.11.8) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/venv/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (1.26.2) Requirement already satisfied: idna<3,>=2.5 in /opt/venv/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (2.10) Requirement already satisfied: zipp>=0.5 in /opt/venv/lib/python3.7/site-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->en_core_web_lg==2.3.1) (3.4.0) Building wheels for collected packages: en-core-web-lg Building wheel for en-core-web-lg (setup.py) ... ?25ldone e=en_core_web_lg-2.3.1-py3-none-any.whl size=782936123 sha256=a2476b400043b49b30186fa1890ae79192b1d461ba0bed048e3d10d91b4b2475 Stored in directory: /tmp/pip-ephem-wheel-cache-24xczx7i/wheels/41/75/77/c4a98e18b2c317a2a13931cbbea7e3ca7f3a21efc36adc1d71 Successfully built en-core-web-lg Installing collected packages: en-core-web-lg Successfully installed en-core-web-lg-2.3.1 ✔ Download and installation successful You can now load the model via spacy.load('en_core_web_lg')

# import libraries
import en_core_web_lg
import pandas as pd
import re
import random
import spacy
from spacy.util import minibatch, compounding
import warnings
import matplotlib.pyplot as plt

Generating Food Data

We'll be using food data from the USDA's Branded Food's dataset.

Preparing the food data

# read in the food csv file
food_df = pd.read_csv("food.csv")

# print row and column information
food_df.head()
index
fdc_id
data_type
description
food_category_idpublication_date
0356425branded_foodMOCHI ICE CREAM BONBONSNaN2019-04-01
1356426branded_foodCHIPOTLE BARBECUE SAUCENaN2019-04-01
2356427branded_foodHOT & SPICY BARBECUE SAUCENaN2019-04-01
3356428branded_foodBARBECUE SAUCENaN2019-04-01
4356429branded_foodBARBECUE SAUCENaN2019-04-01

You can see that the description column has the names for the foods we're interested in.

# print the size 
food_df["description"].size

354565

We have way more rows than we need, so let's:

  • Disqualify foods with special characters (+,&, !, , and so on).
  • Filter out foods containing more than 3 words.
# diaqualify foods with special characters, lowercase and extract results from "description" column
foods = food_df[food_df["description"].str.contains("[^a-zA-Z ]") == False]["description"].apply(lambda food: food.lower())

# filter out foods with more than 3 words, drop any duplicates
foods = foods[foods.str.split().apply(len) <= 3].drop_duplicates()

# print the remaining size
foods.size

37596

Now, we need to think about how we want our data to be distributed. By reducing to 3-worded food items, we effectively have food entities that look like this:

  • hamburger | 1-worded
  • grilled cheese | 2-worded
  • chocolate ice cream | 3-worded

When feeding our training data into spaCy, we want to think about the bias we want spaCy to avoid.

# find one-worded, two-worded and three-worded foods
one_worded_foods = foods[foods.str.split().apply(len) == 1]
two_worded_foods = foods[foods.str.split().apply(len) == 2]
three_worded_foods = foods[foods.str.split().apply(len) == 3]

# create a bar plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.bar([1, 2, 3], [one_worded_foods.size, two_worded_foods.size, three_worded_foods.size])

# label the x-axis instances
ax.set_xticks([1, 2, 3])
ax.set_xticklabels(["one", "two", "three"])

# set the title and the xy-axis labels
plt.title("Number of Words in Food Entities")
plt.xlabel("Number of Words")
plt.ylabel("Food Entities")

# display the plot
plt.show()

Because the majority of our food entities are multi-worded, spaCy would develop a bias for multi-worded foods. If we look back to our example of grilled cheese, it's not a big deal if spaCy identifies cheese instead of grilled cheese. It is a big deal if spaCy fails to identify cheese at all.

As an aside, I ran this experiment, and spaCy only had a 10% accuracy for classifying single-worded FOOD entities, failing with foods such as hamburger and cheese.

So let's filter the dataset further, such that 45% are one-worded foods, 30% are two-worded foods, and 25% are three-worded foods.

# total number of foods
total_num_foods = round(one_worded_foods.size / 45 * 100)

# shuffle the 2-worded and 3-worded foods since we'll be slicing them
two_worded_foods = two_worded_foods.sample(frac=1)
three_worded_foods = three_worded_foods.sample(frac=1)

# append the foods together 
foods = one_worded_foods.append(two_worded_foods[:round(total_num_foods * 0.30)]).append(three_worded_foods[:round(total_num_foods * 0.25)])

# print the resulting sizes
for i in range(3):
    print(f"{i+1}-worded food entities:", foods[foods.str.split().apply(len) == i + 1].size)

1-worded food entities: 1258 2-worded food entities: 839 3-worded food entities: 699

Split train and test food data

At this point, we want to create different placeholders that we can insert our food entities into. I'll come back to update this once I'm testing the project with real users.

food_templates = [
    "I ate my {}",
    "I'm eating a {}",
    "I just ate a {}",
    "I only ate the {}",
    "I'm done eating a {}",
    "I've already eaten a {}",
    "I just finished my {}",
    "When I was having lunch I ate a {}",
    "I had a {} and a {} today",
    "I ate a {} and a {} for lunch",
    "I made a {} and {} for lunch",
    "I ate {} and {}",
    "today I ate a {} and a {} for lunch",
    "I had {} with my husband last night",
    "I brought you some {} on my birthday",
    "I made {} for yesterday's dinner",
    "last night, a {} was sent to me with {}",
    "I had {} yesterday and I'd like to eat it anyway",
    "I ate a couple of {} last night",
    "I had some {} at dinner last night",
    "Last night, I ordered some {}",
    "I made a {} last night",
    "I had a bowl of {} with {} and I wanted to go to the mall today",
    "I brought a basket of {} for breakfast this morning",
    "I had a bowl of {}",
    "I ate a {} with {} in the morning",
    "I made a bowl of {} for my breakfast",
    "There's {} for breakfast in the bowl this morning",
    "This morning, I made a bowl of {}",
    "I decided to have some {} as a little bonus",
    "I decided to enjoy some {}",
    "I've decided to have some {} for dessert",
    "I had a {}, a {} and {} at home",
    "I took a {}, {} and {} on the weekend",
    "I ate a {} with {} and {} just now",
    "Last night, I ate an {} with {} and {}",
    "I tasted some {}, {} and {} at the office",
    "There's a basket of {}, {} and {} that I consumed",
    "I devoured a {}, {} and {}",
    "I've already had a bag of {}, {} and {} from the fridge"
]

We'll break up our food sentences (which contain our entities) into a training set and a test set. We also need the data to be in a specific format for training:

data = [
    ("I love chicken", [(8, 13, "FOOD")]),
    ... 
]
# create dictionaries to store the generated food combinations. Do note that one_food != one_worded_food. one_food == "barbecue sauce", one_worded_food == "sauce"
TRAIN_FOOD_DATA = {
    "one_food": [],
    "two_foods": [],
    "three_foods": []
}

TEST_FOOD_DATA = {
    "one_food": [],
    "two_foods": [],
    "three_foods": []
}

# one_food, two_food, and three_food combinations will be limited to 167 sentences
FOOD_SENTENCE_LIMIT = 167

# helper function for deciding what dictionary and subsequent array to append the food sentence on to
def get_food_data(count):
    return {
        1: TRAIN_FOOD_DATA["one_food"] if len(TRAIN_FOOD_DATA["one_food"]) < FOOD_SENTENCE_LIMIT else TEST_FOOD_DATA["one_food"],
        2: TRAIN_FOOD_DATA["two_foods"] if len(TRAIN_FOOD_DATA["two_foods"]) < FOOD_SENTENCE_LIMIT else TEST_FOOD_DATA["two_foods"],
        3: TRAIN_FOOD_DATA["three_foods"] if len(TRAIN_FOOD_DATA["three_foods"]) < FOOD_SENTENCE_LIMIT else TEST_FOOD_DATA["three_foods"],
    }[count]

# the pattern to replace from the template sentences
pattern_to_replace = "{}"

# shuffle the data before starting
foods = foods.sample(frac=1)

# the count that helps us decide when to break from the for loop
food_entity_count = foods.size - 1

# start the while loop, ensure we don't get an index out of bounds error
while food_entity_count >= 2:
    entities = []

    # pick a random food template
    sentence = food_templates[random.randint(0, len(food_templates) - 1)]

    # find out how many braces "{}" need to be replaced in the template
    matches = re.findall(pattern_to_replace, sentence)

    # for each brace, replace with a food entity from the shuffled food data
    for match in matches:
        food = foods.iloc[food_entity_count]
        food_entity_count -= 1

        # replace the pattern, but then find the match of the food entity we just inserted
        sentence = sentence.replace(match, food, 1)
        match_span = re.search(food, sentence).span()

        # use that match to find the index positions of the food entity in the sentence, append
        entities.append((match_span[0], match_span[1], "FOOD"))

    # append the sentence and the position of the entities to the correct dictionary and array
    get_food_data(len(matches)).append((sentence, {"entities": entities}))
# print the number of food sentences, as well as an example sentence
for key in TRAIN_FOOD_DATA:
    print("{} {} sentences: {}".format(len(TRAIN_FOOD_DATA[key]), key, TRAIN_FOOD_DATA[key][0]))

167 one_food sentences: ('Last night, I ordered some bueno', {'entities': [(27, 32, 'FOOD')]}) 167 two_foods sentences: ('I ate a tokyo style ramen and a bunuelos dessert for lunch', {'entities': [(8, 25, 'FOOD'), (32, 48, 'FOOD')]}) 167 three_foods sentences: ('Last night, I ate an edam with strawberry lozenges and springhill strawberry jam', {'entities': [(21, 25, 'FOOD'), (31, 50, 'FOOD'), (55, 80, 'FOOD')]})

Nice, we now have ~500 training sentences, with each sentence either containing 1, 2, or 3 FOOD entities.

for key in TEST_FOOD_DATA:
    print("{} {} items: {}".format(len(TEST_FOOD_DATA[key]), key, TEST_FOOD_DATA[key][0]))

876 one_food items: ("I've already eaten a eggnog", {'entities': [(21, 27, 'FOOD')]}) 191 two_foods items: ('I made a deli chicken salad and organic nutrition bar for lunch', {'entities': [(9, 27, 'FOOD'), (32, 53, 'FOOD')]}) 178 three_foods items: ("There's a basket of tahinibar, gumballs and smoothies that I consumed", {'entities': [(20, 29, 'FOOD'), (31, 39, 'FOOD'), (44, 53, 'FOOD')]})

We also have plenty of test data. Doesn't matter too much that it's not evenly distributed.

Generating Revision Data

As mentioned in the overview, we also need to generate sentences that contain spaCy entities. This helps us avoid the situation where the NER model is able to identify the FOOD entities, but forgets how to classify entities like ORG or PERSON.

While ORG or PERSON isn't important for nutrition-tracking, other entities like QUANTITY and CARDINAL will help us associate foods with their quantities later on:

  • I ate two slices of toast.

Preparing the revision data

# read in the revision data (just used a random article dataset from a different course I had taken)
npr_df = pd.read_csv("npr.csv")

# print row and column information
npr_df.head()
index
Article
0In the Washington of 2016, even when the polic...
1Donald Trump has used Twitter — his prefe...
2Donald Trump is unabashedly praising Russian...
3Updated at 2:50 p. m. ET, Russian President Vl...
4From photography, illustration and video, to d...

We'll keep sentences of a similar length to our generated food sentences.

# create an nlp object as we'll use this to seperate the sentences and identify existing entities
nlp = en_core_web_lg.load()
revision_texts = []

# convert the articles to spacy objects to better identify the sentences. Disabled unneeded components. # takes ~ 4 minutes
for doc in nlp.pipe(npr_df["Article"][:6000], batch_size=30, disable=["tagger", "ner"]):
    for sentence in doc.sents:
        if  40 < len(sentence.text) < 80:
            # some of the sentences had excessive whitespace in between words, so we're trimming that
            revision_texts.append(" ".join(re.split("\s+", sentence.text, flags=re.UNICODE)))

This takes a while. Unfortunately we need a lot of sentences because the entities we'll be able to identify aren't evenly distributed as we'll see later.

revisions = []

# Use the existing spaCy model to predict the entities, then append them to revision
for doc in nlp.pipe(revision_texts, batch_size=50, disable=["tagger", "parser"]):
    
    # don't append sentences that have no entities
    if len(doc.ents) > 0:
        revisions.append((doc.text, {"entities": [(e.start_char, e.end_char, e.label_) for e in doc.ents]}))

Split train and test revision data

In the previous step we filtered out sentences that were too short and too long. We've also used spaCy to predict the entities from the filtered sentences.

# print an example of the revision sentence
print(revisions[0][0])

# print an example of the revision data
print(revisions[0][1])

And in that sense, this year shows little sign of ending on Dec. 31. {'entities': [(19, 28, 'DATE'), (60, 67, 'DATE')]}

When splitting the train and test data, we'll ensure that the revision training data has at least 100 examples of the different entity types.

# create arrays to store the revision data
TRAIN_REVISION_DATA = []
TEST_REVISION_DATA = []

# create dictionaries to keep count of the different entities
TRAIN_ENTITY_COUNTER = {}
TEST_ENTITY_COUNTER = {}

# This will help distribute the entities (i.e. we don't want 1000 PERSON entities, but only 80 ORG entities)
REVISION_SENTENCE_SOFT_LIMIT = 100

# helper function for incrementing the revision counters
def increment_revision_counters(entity_counter, entities):
    for entity in entities:
        label = entity[2]
        if label in entity_counter:
            entity_counter[label] += 1
        else:
            entity_counter[label] = 1

random.shuffle(revisions)
for revision in revisions:
    # get the entities from the revision sentence
    entities = revision[1]["entities"]

    # simple hack to make sure spaCy entities don't get too one-sided
    should_append_to_train_counter = 0
    for _, _, label in entities:
        if label in TRAIN_ENTITY_COUNTER and TRAIN_ENTITY_COUNTER[label] > REVISION_SENTENCE_SOFT_LIMIT:
            should_append_to_train_counter -= 1
        else:
            should_append_to_train_counter += 1

    # simple switch for deciding whether to append to train data or test data
    if should_append_to_train_counter >= 0:
        TRAIN_REVISION_DATA.append(revision)
        increment_revision_counters(TRAIN_ENTITY_COUNTER, entities)
    else:
        TEST_REVISION_DATA.append(revision)
        increment_revision_counters(TEST_ENTITY_COUNTER, entities)

Here are the entities and their counts that were captured in our revision training sentences:

TRAIN_ENTITY_COUNTER

{'DATE': 212, 'GPE': 164, 'CARDINAL': 195, 'PERSON': 254, 'LANGUAGE': 85, 'ORG': 192, 'WORK_OF_ART': 103, 'TIME': 108, 'ORDINAL': 110, 'PERCENT': 101, 'NORP': 115, 'LOC': 106, 'MONEY': 102, 'QUANTITY': 101, 'EVENT': 101, 'PRODUCT': 101, 'LAW': 95, 'FAC': 101}

Here are the entities and counts captured in our revision test sentences. This just shows that our initial sentences had a large number of examples for PERSON but very few for LAW.

TEST_ENTITY_COUNTER

{'PERSON': 14027, 'ORG': 10360, 'DATE': 7153, 'GPE': 5661, 'NORP': 2739, 'CARDINAL': 5397, 'QUANTITY': 171, 'PERCENT': 441, 'TIME': 794, 'FAC': 152, 'LOC': 559, 'ORDINAL': 1151, 'MONEY': 560, 'WORK_OF_ART': 592, 'PRODUCT': 119, 'EVENT': 104, 'LANGUAGE': 24, 'LAW': 12}

Training the NER Model

For every food sentence, I have revision sentences. I haven't actually seen guidance on what this should be, so this is one of those "stir until good enough" moments.

# combine the food training data
TRAIN_FOOD_DATA_COMBINED = TRAIN_FOOD_DATA["one_food"] + TRAIN_FOOD_DATA["two_foods"] + TRAIN_FOOD_DATA["three_foods"]

# print the length of the food training data
print("FOOD", len(TRAIN_FOOD_DATA_COMBINED))

# print the length of the revision training data
print("REVISION", len(TRAIN_REVISION_DATA))

# join and print the combined length
TRAIN_DATA = TRAIN_REVISION_DATA + TRAIN_FOOD_DATA_COMBINED
print("COMBINED", len(TRAIN_DATA))

FOOD 501 REVISION 1490 COMBINED 1991

# add NER to the pipeline and the new label
ner = nlp.get_pipe("ner")
ner.add_label("FOOD")

# get the names of the components we want to disable during training
pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]

# start the training loop, only training NER
epochs = 30
optimizer = nlp.resume_training()
with nlp.disable_pipes(*other_pipes), warnings.catch_warnings():
    warnings.filterwarnings("once", category=UserWarning, module='spacy')
    sizes = compounding(1.0, 4.0, 1.001)
    
    # batch up the examples using spaCy's minibatc
    for epoch in range(epochs):
        examples = TRAIN_DATA
        random.shuffle(examples)
        batches = minibatch(examples, size=sizes)
        losses = {}
        
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)

        print("Losses ({}/{})".format(epoch + 1, epochs), losses)

Losses (1/30) {'ner': 16073.626560282552} Losses (2/30) {'ner': 14584.542186669307} Losses (3/30) {'ner': 14548.422105267644} Losses (4/30) {'ner': 14300.272813861375} Losses (5/30) {'ner': 14186.099578664172} Losses (6/30) {'ner': 14101.325425518793} Losses (7/30) {'ner': 14000.43209932954} Losses (8/30) {'ner': 13994.203212552093} Losses (9/30) {'ner': 14022.144056472462} Losses (10/30) {'ner': 13901.634234594181} Losses (11/30) {'ner': 13745.580805012956} Losses (12/30) {'ner': 13903.877741001314} Losses (13/30) {'ner': 13817.979523293063} Losses (14/30) {'ner': 13830.568540076929} Losses (15/30) {'ner': 13860.916276267613} Losses (16/30) {'ner': 13778.726491773035} Losses (17/30) {'ner': 13602.03644076937} Losses (18/30) {'ner': 13822.509093627625} Losses (19/30) {'ner': 13501.720368203663} Losses (20/30) {'ner': 13691.972102675238} Losses (21/30) {'ner': 13686.256363266613} Losses (22/30) {'ner': 13478.341214498476} Losses (23/30) {'ner': 13622.263516337844} Losses (24/30) {'ner': 13655.980778428377} Losses (25/30) {'ner': 13632.509101368283} Losses (26/30) {'ner': 13548.80638261576} Losses (27/30) {'ner': 13608.923688475392} Losses (28/30) {'ner': 13581.22338909845} Losses (29/30) {'ner': 13501.752293019905} Losses (30/30) {'ner': 13605.513101932593}

Evaluating the Model

# display sentence involving original entities
spacy.displacy.render(nlp("Apple is looking at buying U.K. startup for $1 billion"), style="ent")
Apple ORG is looking at buying U.K. GPE startup for $1 billion MONEY
# display sentences involving target entity
spacy.displacy.render(nlp("I had a hamburger and chips for lunch today."), style="ent")
spacy.displacy.render(nlp("I decided to have chocolate ice cream as a little treat for myself."), style="ent")
spacy.displacy.render(nlp("I ordered basmati rice, leaf spinach and cheese from Tesco yesterday"), style="ent")
I had a hamburger FOOD and chips FOOD for lunch today.
I decided to have chocolate ice cream FOOD as a little treat for myself.
I ordered basmati rice FOOD , leaf spinach FOOD and cheese FOOD from Tesco ORG yesterday

Initial results seem pretty good, let's evaluate on a wider scale.

Evaluating Food Entities

# dictionary to hold our evaluation data
food_evaluation = {
    "one_food": {
        "correct": 0,
        "total": 0,
    },
    "two_foods": {
        "correct": 0,
        "total": 0
    },
    "three_foods": {
        "correct": 0,
        "total": 0
    }
}

word_evaluation = {
    "1_worded_foods": {
        "correct": 0,
        "total": 0
    },
    "2_worded_foods": {
        "correct": 0,
        "total": 0
    },
    "3_worded_foods": {
        "correct": 0,
        "total": 0
    }
}

# loop over data from our test food set (3 keys in total)
for key in TEST_FOOD_DATA:
    foods = TEST_FOOD_DATA[key]

    for food in foods:
        # extract the sentence and correct food entities according to our test data
        sentence = food[0]
        entities = food[1]["entities"]

        # for each entity, use our updated model to make a prediction on the sentence
        for entity in entities:
            doc = nlp(sentence)
            correct_text = sentence[entity[0]:entity[1]]
            n_worded_food =  len(correct_text.split())

            # if we find that there's a match for predicted entity and predicted text, increment correct counters
            for ent in doc.ents:
                if ent.label_ == entity[2] and ent.text == correct_text:
                    food_evaluation[key]["correct"] += 1
                    if n_worded_food > 0:
                        word_evaluation[f"{n_worded_food}_worded_foods"]["correct"] += 1
                    
                    # this break is important, ensures that we're not double counting on a correct match
                    break
            
            #  increment total counters after each entity loop
            food_evaluation[key]["total"] += 1
            if n_worded_food > 0:
                word_evaluation[f"{n_worded_food}_worded_foods"]["total"] += 1
for key in word_evaluation:
    correct = word_evaluation[key]["correct"]
    total = word_evaluation[key]["total"]

    print(f"{key}: {correct / total * 100:.2f}%")

food_total_sum = 0
food_correct_sum = 0

print("---")
for key in food_evaluation:
    correct = food_evaluation[key]["correct"]
    total = food_evaluation[key]["total"]
    
    food_total_sum += total
    food_correct_sum += correct

    print(f"{key}: {correct / total * 100:.2f}%")

print(f"\nTotal: {food_correct_sum/food_total_sum * 100:.2f}%")

1_worded_foods: 91.10% 2_worded_foods: 96.69% 3_worded_foods: 96.88% --- one_food: 91.44% two_foods: 94.76% three_foods: 98.13% Total: 94.14%

These results are really positive. We're stumbling with 1_worded_foods accuracy, though that's potentially because we had more testing data for 1_worded_foods. Perhaps with more test examples for 2_worded_foods and three_worded_foods, we'd also see that accuracy trend to ~91%.

Evaluating Existing Entities

# dictionary which will be populated with the entities and result information
entity_evaluation = {}

# helper function to udpate the entity_evaluation dictionary
def update_results(entity, metric):
    if entity not in entity_evaluation:
        entity_evaluation[entity] = {"correct": 0, "total": 0}
    
    entity_evaluation[entity][metric] += 1

# same as before, see if entities from test set match what spaCy currently predicts
for data in TEST_REVISION_DATA:
    sentence = data[0]
    entities = data[1]["entities"]

    for entity in entities:
        doc = nlp(sentence)
        correct_text = sentence[entity[0]:entity[1]]

        for ent in doc.ents:
            if ent.label_ == entity[2] and ent.text == correct_text:
                update_results(ent.label_, "correct")
                break

        update_results(entity[2], "total")

sum_total = 0
sum_correct = 0

for entity in entity_evaluation:
    total = entity_evaluation[entity]["total"]
    correct = entity_evaluation[entity]["correct"]

    sum_total += total
    sum_correct += correct
    
    print("{} | {:.2f}%".format(entity, correct / total * 100))

print()
print("Overall accuracy: {:.2f}%".format(sum_correct / sum_total * 100))

PERSON | 80.18% ORG | 50.84% DATE | 68.61% GPE | 82.56% NORP | 83.61% CARDINAL | 70.11% QUANTITY | 79.53% PERCENT | 88.44% TIME | 50.88% FAC | 56.58% LOC | 68.69% ORDINAL | 94.53% MONEY | 84.11% WORK_OF_ART | 58.78% PRODUCT | 42.86% EVENT | 63.46% LANGUAGE | 91.67% LAW | 75.00% Overall accuracy: 71.23%

These results are a little harder to interpret. After all, we're testing entities that the original spaCy model predicited for us. Those predicted entities may well be wrong since spaCy's accuracy is at around 86%. If 14% of the entities we're using to verify the accuracy of our new model are wrong, then where does that leave us?

A better comparison would be to load in spaCy's original model and use that to predict against this test set and compare that accuracy % to this one of 71%. We could then use that as a benchmark for measuring how introducing FOOD entities deteriorates our model.

Note: The online notebook I used keeps crashing as I attempt to do this, so will do this on my local computer.

Saving the model

nlp.meta["name"] = "food_entity_extractor_v2"
nlp.to_disk("./models/v2")

Results

The results we arrived at is the following for our FOOD entities:

Category
Results
One-worded foods91.10%
Two-worded foods96.69%
Three-worded foods96.88%
Sentences with one food91.44%
Sentences with two foods94.76%
Sentences with three foods96.88%
Overall accuracy94.14%

The results for our existing entities:

CategoryResults
PERSON80.18%
ORG50.84%
DATE68.61%
GPE82.56%
NORP83.61%
CARDINAL70.11%
QUANTITY79.53%
PERCENT88.44%
TIME50.88%
FAC56.58%
  • 1-10 of 19 items
  • 1
  • 2
  • 10 / page

I'm pretty happy with the accuracy of the FOOD entities, though it'd be worthwhile increasing the test sentences. For the revision entities, I'll first need to test this the accuracy with spaCy's original language model. If the results are still poor, I'll experiment with different revision datasets and adjust the ratio of food:revision sentences.

Hope you've enjoyed reading this as much as I've enjoyed creating it! This provides an honest look into how I problem solve. If you want to see what I'm up to in future, you can find me on Twitter. I'll keep on posting personal changelogs of how I'm doing.