NLPCS 2012 Abstracts


Full Papers
Paper Nr: 1
Title:

Natural Language Processing based Shape Grammar

Authors:

Arus Kunkhet, Bernadette Sharp and Len Noriega

Abstract: Currently shape grammars are designed for static models and applied in limited domains. They demand extensive user skills and cannot guarantee aesthetic results. Although the current approaches to shape grammar produce infinite designs the final designs lack context and harmony. The aim of this paper is to address the contextual and harmonisation issues in shape grammar by proposing a shape grammar framework inspired by the field of natural language processing. The new shape grammar framework make use of the four levels of analysis namely lexical, syntactic, semantic, and pragmatic levels, to enhance the overall design process. In satisfying these semantically and pragmatically well-formed constraints, the generated shapes can be contextual and harmonious.

Paper Nr: 3
Title:

Integrating a Model for Visual Attention into a System for Natural Language Parsing

Authors:

Christopher Baumgärtner and Wolfgang Menzel

Abstract: We present a system for integrating knowledge about complex visual scenes into the process of natural language comprehension. The implemented system is able to choose a scene of reference for a natural language sentence from a large set of scene descriptions. This scene is then used to influence the analysis of a sentence generated by a broad coverage language parser. In addition, objects and actions referred to by the sentence are visualized by a saliency map which is derived from the bi-directional influence of top down and bottom-up information on a model of visual attention highlighting the regions with the highest probability of attracting the attention of an observer.

Paper Nr: 4
Title:

Feature-based Ontology Mapping from an Information Receivers’ Viewpoint

Authors:

Fumiko Kano Glückstad and Morten Mørup

Abstract: This paper compares four algorithms for computing feature-based similarities between concepts respectively possessing a distinctive set of features. The eventual purpose of comparing these feature-based similarity algorithms is to identify a candidate term in a Target Language (TL) that can optimally convey the original meaning of a culturally-specific Source Language (SL) concept to a TL audience by aligning two culturally-dependent domain-specific ontologies. The results indicate that the Bayesian Model of Generalization [1] performs best, not only for identifying candidate translation terms, but also for computing probabilities that an information receiver successfully infers the meaning of an SL concept from a given TL translation.

Paper Nr: 6
Title:

On Operative Creation of Lexical Resources in Different Languages

Authors:

Svetlana Sheremetyeva

Abstract: Cognitive modeling is to a large extent mediated by lexicons thus bringing in focus operative creation of high quality lexical resources. This paper presents a methodology and tool for automatic extraction of lexical data from textual sources. The methodology combines n-gram extraction and a filtering algorithm, which operates blocs of shallow linguistic knowledge. The specificity of the approach is three-fold, - (i) it allows dynamic extraction of lexical resources and does not rely on a pre-constructed corpus; (ii) it does not miss low frequency units; (iii) it is portable between different lexical types, domains and languages. The methodology has been implemented into a tool that can be used in a wide range of text processing tasks useful for cognitive modeling from ontology acquisition, to automatic annotation, multilingual information retrieval, machine translation, etc.

Paper Nr: 9
Title:

PurePos: An Open Source Morphological Disambiguator

Authors:

György Orosz and Attila Novák

Abstract: This paper presents PurePos, a new open source Hidden Markov model based morphological tagger tool that has an interface to an integrated morphological analyzer and thus performs full disambiguated morphological analysis including lemmatization of words both known and unknown to the morphological analyzer. The tagger is implemented in Java and has a permissive LGPL license thus it is easy to integrate and modify. It is fast to train and use while having an accuracy on par with slow to train Maximum Entropy or Conditional Random Field based taggers. Full integration with morphology and an incremental training feature make it suited for integration in web based applications. We show that the integration with morphology boosts our tool’s accuracy in every respect – especially in full morphological disambiguation – when used for morphologically complex agglutinating languages. We evaluate PurePos on Hungarian data demonstrating its state-of-the-art performance in terms of tagging precision and accuracy of full morphological analysis.

Paper Nr: 9
Title:

PurePos: An Open Source Morphological Disambiguator

Authors:

György Orosz and Attila Novák

Abstract: This paper presents PurePos, a new open source Hidden Markov model based morphological tagger tool that has an interface to an integrated morphological analyzer and thus performs full disambiguated morphological analysis including lemmatization of words both known and unknown to the morphological analyzer. The tagger is implemented in Java and has a permissive LGPL license thus it is easy to integrate and modify. It is fast to train and use while having an accuracy on par with slow to train Maximum Entropy or Conditional Random Field based taggers. Full integration with morphology and an incremental training feature make it suited for integration in web based applications. We show that the integration with morphology boosts our tool’s accuracy in every respect – especially in full morphological disambiguation – when used for morphologically complex agglutinating languages. We evaluate PurePos on Hungarian data demonstrating its state-of-the-art performance in terms of tagging precision and accuracy of full morphological analysis.

Paper Nr: 11
Title:

Progressive Use of Metrical Cues: A Cross-linguistic Study

Authors:

Sandrien van Ommen and René Kager

Abstract: Within the framework of a larger project on metrical segmentation this study presents the first results of a cross-linguistic experiment with Dutch (penultimate word stress) and Turkish (word final stress) listeners. Previous studies have shown that listeners interpret stressed or strong (non-reduced) syllables as potential beginnings of words in a.o. English [4], and Dutch [13], [22]. This is interpreted as evidence for the Metrical Segmentation Hypothesis, which predicts that listeners have and use a parsing ability based on edgealigned stress. However, evidence for a facilitatory effect of right-edge aligned stress is sparse (but see [6]). The current non-word spotting experiment was designed to find out whether listeners can anticipate a word boundary using language-specific stress patterns. The results show that this is partly the case: Dutch listeners are quicker to spot the ‘word’ when it is preceded by their native penultimate pattern; Turkish listeners are aided by their native final stress pattern as well as by penultimate stress. Turkish listeners, furthermore, make regressive use of metrical cues.

Paper Nr: 12
Title:

A General Theory of Tempo-logical Connectives and Its Application to Spatiotemporal Reasoning in Natural Language Understanding

Authors:

Masao Yokota

Abstract: Mental Image Directed Semantic Theory (MIDST) has proposed the knowledge representation language Lmd in order to facilitate language-centered multimedia communication between ordinary people and home robots in the daily life. Lmd has employed the ‘tempo-logical connectives (TLCs)’ to represent both temporal and logical relations between two events, and the ‘temporal conjunctions’, a subset of TLCs, have already been applied to formulating natural event concepts, namely, event concepts represented in natural language. This paper presents the theory of TLCs extended for formalizing human intuitive spatiotemporal knowledge and its application to automatic reasoning about space and time expressed in natural language.

Paper Nr: 14
Title:

Continue or Stop Reading? Modeling Decisions in Information Search

Authors:

Francisco López-Orozco, Anne Guérin-Dugué and Benoît Lemaire

Abstract: This paper presents a cognitive computational model of the way people read a paragraph with the task of quickly deciding whether it is better related to a given goal than another paragraph processed previously. In particular, the model attempts to predict the time at which participants would decide to stop reading the current paragraph because they have enough information to make their decision. We proposed a two-variable linear threshold to account for that decision, based on the rank of the fixation and the difference of semantic similarities between each paragraph and the goal. Our model performance is compared to the eye tracking data of 22 participants.

Paper Nr: 17
Title:

Enrichment of Inflection Dictionaries: Automatic Extraction of Semantic Labels from Encyclopedic Definitions

Authors:

Pawel Chrzaszcz

Abstract: Inflection dictionaries are widely used in many natural language processing tasks, especially for inflecting languages. However, they lack semantic information, which could increase the accuracy of such processing. This paper describes a method to extract semantic labels from encyclopedic entries. Adding such labels to an inflection dictionary could eliminate the need of using ontologies and similar complex semantic structures for many typical tasks. A semantic label is either a single word or a sequence of words that describes the meaning of a headword, hence it is similar to a semantic category. However, no taxonomy of such categories is known prior to the extraction. Encyclopedic articles consist of headwords and their definitions, so the definitions are used as sources for semantic labels. The described algorithm has been implemented for extracting data from the Polish Wikipedia. It is based on definition structure analysis, heuristic methods and word form recognition and processing with use of the Polish Inflection Dictionary. This paper contains a description of the method and test results as well as discussion on possible further development.

Paper Nr: 18
Title:

Learn to Speak Like Normal People Do: The Case of Object Descriptions

Authors:

Michael Zock, Guy Lapalme and Mehdi Yousfi-Monod

Abstract: Successful communication requires loads of various knowledge including peoples' habits to express their thoughts. We deal here with the description of objects composing a scene. This is called 'reference generation', a task very frequently performed in language production. The skill of language production requires at least two kinds of competencies: one dealing with the mapping of concepts to forms (linguistic competency), the other dealing with the choice of the various resources (communicative competency). Messages can be expressed in many ways and at various levels: lexical, grammatical, morphological, etc. Since different means tend to produce different effects, students have to learn when to use which. This kind of knowledge, commonly called 'pragmatic knowledge' or 'communicative competency', can rarely be formalized as solid rules. It is largely based on experience. People learn on the basis of correlations, that is they realize that changes of the situation may reflect in language: different inputs (ideas, objects of a scene) yielding different outputs, i.e. linguistic forms. We present here a setting that allows for this kind of learning. It is a web-based application that generates a scene and various descriptions of its components. Users can change the scene and watch how these choices affect (or not) the linguistic form. The descriptions are produced in two languages (English and French), and they are rated in terms of communicative adequacy. This should allow students not only to learn how to produce correct sentences, but also help them to realize which one of them is, communicatively speaking, the most adequate form.

Paper Nr: 21
Title:

Automatic Extraction of Part-whole Relations

Authors:

Debela Tesfaye and Michael Zock

Abstract: The Web has become a universal repository of knowledge allowing to share information in a scale never seen before. Yet, with the growing size of the resource have grown the difficulties to access information. In this paper we present a system that automatically extracts Part-Whole relations among nouns. The approach is unsupervised, quasi language independent and does not require a huge resource like WordNet. The results show that the patterns used to extract Part-Whole relations can be learned from N-grams.

Posters
Paper Nr: 16
Title:

A Dictionary based Stemming Mechanism for Polish

Authors:

Michał Korzycki

Abstract: In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish Inflection Dictionary to build a Rule Based Stemmer and a Generative Reversed Rule Stemmer. The combination of both stemmers in the shape of the described Hybrid Stemmer provides us with a high precision stemming mechanism that is able to match human performance. This assumption is supported by a conducted experiment, the results of which are presented.