Corpus SenSem Español

Full Name
Corpus SenSem Español
Composer
Grupo de Investigación Interuniversitario en Aplicaciones Lingüísticas (GRIAL), with members from four different universities (Universidad Autónoma de Barcelona, Universitat de Barcelona, Universidad de Lleída, Universitat Oberta de Catalunya
Language
Spanish
Iberian Spanish
Register
Written
Genre
Newspaper
Poetry
Style
Formal
Period
2000-2100 AD
1900-2000 AD
Number of words
500.000 - 1.000.000
Annotation
Lemmatisation
POS tagging
Semantic annotation
Annotation remarks

The Corpus SenSem Spanish (formerly Corpus Grail) and the Corpus SenSem Spanish - Semantic annotation of nouns and adjectives are manually annotated corpora for semantic-syntactic search (Alonso et al., 2007). Both corpora are waiting to be merged into a single resource. Consisting of 30,000 single sentences, which form a random sample of 125 sentences for each of the 250 most frequent Spanish V . The phrases come from a source consisting of El Periódico and some literary Spanish texts. The corpus is tagged extensively, according to the meaning of V, the category and syntactic function, semantic role of the participants and the sentence semantics (aspectual information, mood, polarity, constructional information, see inter alia Vazquez / Fernández 2008, 2010). The query interface on the Web is designed to make searches based on these parameters, but also allows queries of single words. Meanwhile, the Corpus SenSem Spanish - Semantic annotation of nouns and adjectives is lemmatized and contains semantic tagging according to the EuroWordNet (http://adimen.si.ehu.es/web/MCR) system.

 

Format remarks

Available for download at https://grial.uab.es/sensem/download/main,es