Saturday, July 15, 2017

Language Reources and Evaluation

The 200 most cited articles




1.            Annotating expressions of opinions and emotions in language  
2.            The waCky wide web: A collection of very large linguistically processed web-crawled corpora  
3.            IEMOCAP: Interactive emotional dyadic motion capture database  
4.            Unleashing the killer corpus: Experiences in creating the multi-everything AMI Meeting Corpus  
5.            How variable may a constant be? Measures of lexical richness in perspective  
6.            The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena  
7.            A large-scale classification of English verbs  
8.            Authorship attribution in the wild  
9.            A multidimensional approach for detecting irony in Twitter  
10.          I don't believe in word senses  
11.          Cross-language plagiarism detection  
12.          Factbank: A corpus annotated with event factuality  
13.          Lexical association measures and collocation extraction  
14.          Computer-based authorship attribution without lexical measures  
15.          Intrinsic plagiarism analysis  
16.          The English lexical substitution task  
17.          Temporal and event information in natural language text  
18.          The tempEval challenge: Identifying temporal relations in text  
19.          The challenge of optical music recognition  
20.          Framework and results for English SENSEVAL  
21.          Neural network applications in stylometry: The federalist papers  
22.          Multilingual and cross-domain temporal tagging  
23.          Interchanging lexical resources on the Semantic Web  
24.          A semantic network of English: The mother of all WordNets  
25.          A web-based Bengali news corpus for named entity recognition  
26.          Developing a corpus of plagiarised short answers  
27.          Language resources for Hebrew  
28.          An annotation scheme for conversational gestures: How to economically capture timing and form  
29.          The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms  
30.          Comparative evaluation of text classification techniques using a large diverse Arabic dataset  
31.          Perspectives on crowdsourcing annotations for natural language processing  
32.          Microblog language identification: Overcoming the limitations of short, unedited and idiomatic text  
33.          Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages  
34.          The TORGO database of acoustic and articulatory speech from speakers with dysarthria  
35.          AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan  
36.          The NXT-format Switchboard Corpus: A rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue  
37.          Classification of semantic relations between nominals  
38.          The NITE XML toolkit: Data model and query language  
39.          A multimodal annotated corpus of consensus decision making meetings  
40.          The ACL anthology network corpus  
41.          A multilingual ontology for infectious disease surveillance: Rationale, design and challenges  
42.          Introduction to EuroWordNet  
43.          Virtual agent multimodal mimicry of humans  
44.          Unsupervised morphological parsing of Bengali  
45.          Automatic keyphrase extraction from scientific articles  
46.          Creating a live, public short message service corpus: The NUS SMS corpus  
47.          Creating a system for lexical substitutions from scratch using crowdsourcing  
48.          Thesaurus or logical ontology, which one do we need for text mining?  
49.          The bible as a parallel corpus: Annotating the "book of 2000 tongues"  
50.          WordNet then and now  
51.          MULTEXT-East: Morphosyntactic resources for Central and Eastern European languages  
52.          Corpus-based generation of head and eyebrow motion for an embodied conversational agent  
53.          Guidelines for word alignment evaluation and manual alignment  
54.          On the evaluation and improvement of Arabic WordNet coverage and usability  
55.          Lessons from building a Persian written corpus: Peykare  
56.          Japanese/english cross-language information retrieval: Exploration of query translation and transliteration  
57.          SpatialML: Annotation scheme, resources, and evaluation  
58.          Compositionality and lexical alignment of multi-word terms  
59.          TimeBank evolution as a community resource for TimeML parsing  
60.          The chicken-and-egg problem in wordnet design: Synonymy, synsets and constitutive relations  
61.          The Corpus DIMEx100: Transcription and evaluation  
62.          Temporal closure in an annotation environment  
63.          Hierarchical decision lists for word sense disambiguation  
64.          Balanced corpus of contemporary written Japanese  
65.          Dannet: The challenge of compiling a wordnet for Danish by reusing a monolingual dictionary  
66.          Getting to the heart of the matter: Speech as the expression of affect; Rather than just text or language  
67.          Do word meanings exist?  
68.          The top-down strategy for building EuroWordNet: Vocabulary coverage, base concepts and top ontology  
69.          The state of authorship attribution studies: Some problems and solutions  
70.          ECO and Onto.PT: A flexible approach for creating a Portuguese wordnet automatically  
71.          IPLR: An online resource for Greek word-level and sublexical information  
72.          Improving English verb sense disambiguation performance with linguistically motivated features and clear sense distinction boundaries  
73.          Evaluation of machine learning-based information extraction algorithms: Criticisms and recommendations  
74.          A novel approach for ranking spelling error corrections for Urdu  
75.          And then there were none: Winnowing the shakespeare claimants  
76.          GATE Teamware: A web-based, collaborative text annotation framework  
77.          Glissando: A corpus for multidisciplinary prosodic studies in Spanish and Catalan  
78.          Alcohol language corpus: The first public corpus of alcoholized German speech  
79.          Annotating expressions of Appraisal in English  
80.          A survey of methods to ease the development of highly multilingual text mining applications  
81.          The Hamburg Metaphor Database project: Issues in resource creation  
82.          A corpus for studying addressing behaviour in multi-party dialogues  
83.          Stephen Crane and the New-York Tribune: A case study in traditional and non-traditional authorship attribution  
84.          SALDO: A touch of yin to WordNet's yang  
85.          Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging  
86.          FrameNet, current collaborations and future goals  
87.          Challenges for a multilingual wordnet  
88.          Data and models for metonymy resolution  
89.          Multilingual resources for NLP in the lexical markup framework (LMF)  
90.          Combining linguistic resources to create a machine-tractable Japanese-Malay dictionary  
91.          The analysis of embodied communicative feedback in multimodal corpora: A prerequisite for behavior simulation  
92.          Yule's characteristic K revisited  
93.          Archaeological data models and web publication using XML  
94.          HamleDT: Harmonized multi-language dependency treebank  
95.          Supervised collaboration for syntactic annotation of Quranic Arabic  
96.          Is singular value decomposition useful for word similarity extraction?  
97.          Alignment-based extraction of multiword expressions  
98.          Multilingual collocation extraction with a syntactic parser  
99.          Copy detection in Chinese documents using Ferret  
100.        Cross-lingual sense determination: Can it work?  
101.        Access to pictorial material: A review of current research and future prospects  
102.        Spontaneous speech and opinion detection: Mining call-centre transcripts  
103.        Phonetically rich and balanced text and speech corpora for Arabic language  
104.        Methodology and construction of the Basque WordNet  
105.        Multiword expressions: Hard going or plain sailing?  
106.        Product named entity recognition in Chinese text  
107.        Dimensionality of dialogue act tagsets: An empirical analysis of large corpora  
108.        Reader-based exploration of lexical cohesion  
109.        Fact distribution in Information Extraction  
110.        Gore galore: Literary theory and computer games  
111.        The linguistic design of the EuroWordNet database  
112.        An Estonian morphological analyser and the impact of a corpus on its development  
113.        An overview of the European Union’s highly multilingual parallel corpora  
114.        Evaluating word sense induction and disambiguation methods  
115.        WHAD: Wikipedia historical attributes data: Historical structured data extraction and vandalism detection from the Wikipedia edit history  
116.        Classifying unlabeled short texts using a fuzzy declarative approach  
117.        A real time Named Entity Recognition system for Arabic text mining  
118.        Resources for Turkish morphological processing  
119.        Annotation of multiword expressions in the Prague dependency treebank  
120.        Valence extraction using EM selection and co-occurrence matrices  
121.        Normalization of Chinese chat language  
122.        LTAG-spinal and the Treebank: A new resource for incremental, dependency and semantic parsing  
123.        The GOLD Community of Practice: An infrastructure for linguistic data on the Web  
124.        The Linguistic Annotation Framework: a standard for annotation interchange and merging  
125.        Fine-grained Dutch named entity recognition  
126.        InSight Interaction: a multimodal and multifocal dialogue corpus  
127.        Evaluating and automating the annotation of a learner corpus  
128.        Analyzing the capabilities of crowdsourcing services for text summarization  
129.        Is there a language of sentiment? An analysis of lexical resources for sentiment analysis  
130.        EmoTales: Creating a corpus of folk tales with emotional annotations  
131.        Constructing and utilizing wordnets using statistical methods  
132.        Question answering at the cross-language evaluation forum 2003-2010  
133.        Overcoming statistical machine translation limitations: Error analysis and proposed solutions for the Catalan-Spanish language pair  
134.        Compilation of an idiom example database for supervised idiom identification  
135.        Multilingual language resources and interoperability  
136.        Automatic building of an ontology on the basis of text corpora in Thai  
137.        From the field to the web: Implementing best-practice recommendations in documentary linguistics  
138.        Tagging Icelandic text: An experiment with integrations and combinations of taggers  
139.        Adaptation of an automotive dialogue system to users' expertise and evaluation of the system  
140.        Applying EuroWordNet to cross-language text retrieval  
141.        Traditional and emotional stylometric analysis of the songs of beatles Paul McCartney and John Lennon  
142.        A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora  
143.        Building the essential resources for Finnish: the Turku Dependency Treebank  
144.        Is it possible to create a very large wordnet in 100 days? An evaluation  
145.        Large, huge or gigantic? Identifying and encoding intensity relations among adjectives in WordNet  
146.        The Spanish DELPH-IN grammar  
147.        Semi-automatic enrichment of crowdsourced synonymy networks: The WISIGOTH system applied to Wiktionary  
148.        The MATCH corpus: A corpus of older and younger users' interactions with spoken dialogue systems  
149.        Irony in a judicial debate: Analyzing the subtleties of irony while testing the subtleties of an annotation scheme  
150.        Automatic induction of language model data for a spoken dialogue system  
151.        Can we talk? Methods for evaluation and training of spoken dialogue systems  
152.        Digital facsimiles: Reading the William Blake archive  
153.        Peeling an onion: The lexicographer's experience of manual sense-tagging  
154.        Computers and resource-based history teaching: A UK perspective  
155.        Using the right tools: Enhancing retrieval from marked-up documents  
156.        CityU corpus of essay drafts of English language learners: a corpus of textual revision in second language writing  
157.        The good, the bad and the implicit: a comprehensive approach to annotating explicit and implicit sentiment  
158.        The Chinese Discourse TreeBank: a Chinese corpus annotated with discourse relations  
159.        Bucking the trend: Improved evaluation and annotation practices for ESL error detection systems  
160.        The Romanian wordnet in a nutshell  
161.        Twitter n-gram corpus with demographic metadata  
162.        Collective intelligence and language resources: Introduction to the special issue on collaboratively constructed language resources  
163.        Multiplicity and word sense: Evaluating and learning from multiply labeled word sense annotations  
164.        Annotation of sentence structure: Capturing the relationship between clauses in Czech sentences  
165.        Collecting and evaluating speech recognition corpora for 11 South African languages  
166.        Statistical unicodification of African languages   
167.        DuELME: A Dutch electronic lexicon of multiword expressions  
168.        WOZ acoustic data collection for interactive TV  
169.        Exploring interoperability of language resources: The case of cross-lingual semi-automatic enrichment of wordnets  
170.        Lexical systems: Graph models of natural language lexicons  
171.        The Hinoki syntactic and semantic treebank of Japanese (Language Resources and Evaluation DOI: 10.1007/s10579-007-9036-6)  
172.        The importance of gaze and gesture in interactive multimodal explanation  
173.        Urdu in a parallel grammar development environment  
174.        Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language  
175.        Automatically learning semantic knowledge about multiword predicates  
176.        The Hinoki syntactic and semantic treebank of Japanese  
177.        Complex predicates in Indian languages and wordnets  
178.        Automatically generating related queries in Japanese  
179.        Detecting Japanese idioms with a linguistically rich dictionary  
180.        How to measure the meanings of words? amour in Corneille's work  
181.        The role of inference in the temporal annotation and analysis of text  
182.        Some of my best friends are linguists  
183.        Statistical morphological disambiguation for agglutinative languages  
184.        Pattern processing in melodic sequences: Challenges, caveats and prospects  
185.        Wag the dog? Online conferencing and teaching  
186.        Senseval: The CL research experience  
187.        Discovering Buffalo story robes: A case for cross-domain information strategies  
188.        Cross-linguistic alignment of WordNets with an inter-lingual-index  
189.        Developing a successful SemEval task in sentiment analysis of Twitter and other social media texts  
190.        A massively parallel corpus: the Bible in 100 languages  
191.        Multimodal corpus of multiparty conversations in L1 and L2 languages and findings obtained from it  
192.        Automatic dialogue act recognition with syntactic features  
193.        Text simplification resources for Spanish  
194.        Capturing divergence in dependency trees to improve syntactic projection  
195.        TypeCraft collaborative databasing and resource sharing for linguists  
196.        Introduction to the special issue: On wordnets and relations  
197.        Tailoring the automated construction of large-scale taxonomies using the web  
198.        Beyond sentence-level semantic role labeling: Linking argument structures in discourse  
199.        Coreference resolution: An empirical study based on SemEval-2010 shared Task 1  
200.        An open diachronic corpus of historical Spanish  


No comments: