proycon/folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions

55
/ 100
Established

This project provides FoLiA, a standardized XML-based format for storing and exchanging language resources with rich linguistic annotations. It accepts raw text or existing annotated corpora and produces a meticulously structured FoLiA XML file that details various linguistic features. Linguists, computational linguists, and researchers working with annotated text data will find this useful for managing and sharing their datasets.

Used by 1 other package. Available on PyPI.

Use this if you need a flexible and highly expressive format to represent diverse linguistic annotations in your language resources or corpora.

Not ideal if you primarily work with very simple plain text or require a format solely for basic, unstructured text without any linguistic markup.

linguistic-annotation corpus-linguistics natural-language-processing language-resource-management text-analysis
Maintenance 6 / 25
Adoption 9 / 25
Maturity 25 / 25
Community 15 / 25

How are scores calculated?

Stars

65

Forks

10

Language

Python

License

GPL-3.0

Last pushed

Dec 09, 2025

Commits (30d)

0

Dependencies

3

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/proycon/folia"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.