AI & ML interests

Historical Media Analysis and Enrichment

Recent Activity

simon-clmtdΒ  updated a Space about 16 hours ago
impresso-project/README
simon-clmtdΒ  updated a Space 13 days ago
impresso-project/ocrqa-exploration
View all activity

Organization Card

Impresso - Media Monitoring of the Past is an interdisciplinary research project using machine learning to transform how historical media are processed, enriched, explored, and studied across modalities, languages, time periods, and national borders.

We develop the πŸš€ Impresso Web App and the πŸ”¬ Impresso Datalab, providing access to a large multilingual corpus of historical newspapers and radio broadcasts.

πŸ€– Models and πŸ“š datasets
  • πŸ€– Impresso models for historical multilingual documents, including language identification, OCR quality assessment, topic inference, NER, and NEL.
  • πŸ“š Impresso datasets curated from digitized historical media sources for ML development and evaluation. Upcoming releases include NER and NEL benchmarks from the HIPE evaluation campaign, an image type classification dataset, and more.
πŸ›οΈ Partners and funding

Impresso gratefully acknowledges the continued support of its cultural heritage partners, as well as funding from the SNSF (Grant Nos. CRSII5_173719 and CRSII5_213585) and the FNR (Grant No. 17498891).