This tutorial demonstrates how to use the pre-trained food classification models for Danish food text classification. You can find the pretrained models here: https://huggingface.co/arnorsig/danish-food-classification
✏️ Note: The tutorial currently only supports Linux/MacOS.Two pre-trained models are available:
majorcateg_run3jseq): Classifies food into 17 major
categories (Alcohol, CoffeeTea, Dairy, Egg, Fastfood, Fish, Fruit,
Grains, MeatPoultry, MixedDishes, etc.)subcateg_run3jseq):
Classifies food into 63 detailed subcategories (CultMilk, UnflavMilk,
RedMeat, WhiteMeat, SafeFish, Wholegrain, etc.)Both models use a Danish RoBERTa-based sequence model (DDSC/roberta-base-danish) for text encoding.
Each model is approximately 497-543MB and has the following structure:
model_name/
└── training_output/
├── meta/
│ └── eir_version.txt
├── model_info.txt
├── saved_models/
│ └── output_model_*.pt
└── serializations/
├── configs_stripped/
├── sequence_input_serializations/
├── tabular_output_serializations/
└── transformers/
If you don’t already have UV installed:
curl -LsSf https://astral.sh/uv/install.sh | shVerify installation:
uv --versionCreate a Python 3.12 virtual environment and install EIR:
uv venv --python 3.12
source .venv/bin/activate
uv pip install git+https://github.com/arnor-sigurdsson/EIR.git@0.13.x-maintenanceNote: We install from the GitHub maintenance branch due to a dependency issue with the PyPI release.
Verify installation:
eirpredict --helpClone the GitHub repository for scripts:
git clone https://github.com/LoosTeam/DNBC_PregnancyFoodChoices.gitNavigate to the prediction_tutorial directory:
cd prediction_tutorial
mkdir modelsDownload model files from Hugging Face:
uvx hf download arnorsig/danish-food-classification --local-dir models/
Create a CSV file with two columns: ID and
Sequence. The Sequence column should contain
Danish food text descriptions.
An example file (example_inputs.csv) is included in the
package:
ID,Sequence
sample_1,Skyr med blåbær og granola
sample_2,Ribeye bøf med kartofler og grøntsager
sample_3,Laks filet med citron
sample_4,Æblemost uden sukker
sample_5,Hvidt brød med smør og ost
Three configuration files are included in the package: -
predict_global_config.yaml - Global settings (batch size,
device, etc.) - predict_input_config.yaml - Input data
configuration - predict_output_config.yaml - Output
categories configuration
The package includes predict_input_config.yaml template.
To use your own data, update the input_source field:
input_info:
input_inner_key: null
input_name: foodtext
input_source: example_inputs.csv # Change this to your CSV file path
input_type: sequence
input_type_info:
adaptive_tokenizer_max_vocab_size: null
max_length: 81
min_freq: 10
mixing_subtype: mixup
modality_dropout_rate: 0.0
sampling_strategy_if_longer: uniform
split_on: ' '
tokenizer: null
tokenizer_language: null
vocab_file: null
interpretation_config:
interpretation_sampling_strategy: first_n
manual_samples_to_interpret: null
num_samples_to_interpret: 10
model_config:
embedding_dim: 64
freeze_pretrained_model: false
model_init_config: {}
model_type: DDSC/roberta-base-danish
pool: avg
position: embed
position_dropout: 0.1
pretrained_model: true
window_size: 0
pretrained_config: null
tensor_broker_config: nullImportant: Update input_source to point
to your CSV file (use absolute path).
You can copy this directly from the model’s serialization folder. For
the subcategory model, you can use
subcateg_run3jseq/training_output/serializations/configs_stripped/output_configs.yaml.
The key difference for prediction is setting
output_source: null (no labels needed):
output_info:
output_inner_key: null
output_name: foodcateg_output
output_source: null
output_type: tabularCreate predict_global_config.yaml:
attribution_analysis:
compute_attributions: false
basic_experiment:
batch_size: 64
dataloader_workers: 0
device: cpu
memory_dataset: true
visualization_logging:
log_level: info
no_pbar: falseNote: Set device: cuda if you have a
GPU available for faster predictions.
eirpredict \
--global_configs configs/predict_global_config.yaml \
--input_configs configs/predict_input_config.yaml \
--output_configs configs/predict_output_majorcateg_config.yaml \
--model_path models/majorcateg_run3jseq/training_output/saved_models/output_model_5400_perf-average=0.9686.pt \
--output_folder predictions_outputeirpredict \
--global_configs configs/predict_global_config.yaml \
--input_configs configs/predict_input_config.yaml \
--output_configs configs/predict_output_subcateg_config.yaml \
--model_path models/subcateg_run3jseq/training_output/saved_models/output_model_6300_perf-average=0.9312.pt \
--output_folder predictions_outputPredictions are organized in a hierarchical folder structure:
predictions_output/
└── foodcateg_output/
├── CultMilk/
│ └── predictions.csv
├── RedMeat/
│ └── predictions.csv
├── SafeFish/
│ └── predictions.csv
└── ... (one folder per category)
Each predictions.csv contains binary classification
logits:
ID,0,1
sample_1,4.7698545,-5.208173
sample_2,-1.7498194,2.2772772
0: Logit for category absence1: Logit for category presence1 > 0To get a readable summary, use this Python script:
import pandas as pd
from pathlib import Path
predictions_dir = Path("predictions_output/foodcateg_output")
categories = [d.name for d in predictions_dir.iterdir() if d.is_dir()]
all_predictions = {}
for category in sorted(categories):
csv_path = predictions_dir / category / "predictions.csv"
df = pd.read_csv(csv_path)
df['prediction'] = (df['1'] > df['0']).astype(int)
df['confidence'] = df['1'] - df['0']
for _, row in df.iterrows():
sample_id = row['ID']
if sample_id not in all_predictions:
all_predictions[sample_id] = {}
all_predictions[sample_id][category] = {
'predicted': row['prediction'],
'confidence': row['confidence']
}
for sample_id in sorted(all_predictions.keys()):
print(f"\n{sample_id}:")
predictions = all_predictions[sample_id]
positive_predictions = [(cat, info['confidence'])
for cat, info in predictions.items()
if info['predicted'] == 1]
positive_predictions.sort(key=lambda x: x[1], reverse=True)
print(f" Predicted categories ({len(positive_predictions)}):")
for cat, conf in positive_predictions:
print(f" - {cat}: {conf:.2f}")sample_1:
Predicted categories (3):
- TropFruit: 1.78
- StoneFruit: 0.46
- SoftCheese: 0.32
sample_2:
Predicted categories (2):
- RootTuber: 6.54
- RedMeat: 4.03
sample_3:
Predicted categories (2):
- SafeFish: 3.29
- TropFruit: 1.76
These models were trained on Danish food text data using EIR version 0.13.9.
You may see warnings about version mismatches (e.g., trained on 0.13.9, running on 0.13.11). This is expected when using the maintenance branch and should not cause issues.
If you encounter memory issues with CPU, reduce
batch_size in the global config. If you have a GPU, set
device: cuda for faster predictions.
If you encounter issues installing from GitHub, ensure you have git installed and network access to GitHub.
You can deploy the model as a REST API using
eirserve:
eirserve --model-path models/subcateg_run3jseq/training_output/saved_models/output_model_6300_perf-average=0.9312.ptThe server will start on http://localhost:8000 with: -
OpenAPI documentation: http://localhost:8000/docs
(interactive API explorer) - Model info:
http://localhost:8000/info (input/output specifications) -
ReDoc: http://localhost:8000/redoc (alternative API
documentation)
Here’s a Python example for sending requests:
import requests
url = "http://localhost:8000/predict"
payload = [{
"foodtext": "Ribeye bøf med kartofler og grøntsager"
}]
response = requests.post(url, json=payload)
predictions = response.json()
print(predictions)Example Response:
{
"result": [
{
"foodcateg_output": {
"RedMeat": 0.982,
"RootTuber": 0.998,
"ProcVeg": 0.156,
...
}
}
]
}/predict endpointThe OpenAPI interface provides: - Interactive testing - Request/response examples - Schema documentation - Easy integration testing
Send multiple samples in one request:
payload = [
{"foodtext": "Skyr med blåbær"},
{"foodtext": "Laks filet"},
{"foodtext": "Kaffe"}
]
response = requests.post("http://localhost:8000/predict", json=payload)This is ideal for: - Integrating with web applications - Real-time predictions - Building food logging apps - API-based microservices
If you use these models, please cite the paper:
Food choices during pregnancy: evidence from 63,405 Danish women by Erica Elizabeth Eberl, Anne Ahrendt Bjerregaard, Arnór Ingi Sigurdsson, Siddhi Yash Jain, Ann-Marie Hellerung Christiansen, Charlotta Granström, Matthew Paul Gillum, Thorhallur Ingi Halldórsson, Simon Rasmussen, Sjurdur Frodi Olsen, Ruth J.F. Loos, and Marta Guasch-Ferré.and the EIR framework:
EIR Framework: https://github.com/arnor-sigurdsson/EIR
Model trained using EIR version 0.13.9