Publikationen zu keinem Projekt zugeordnet

Natalja Friesen, Jörg Kindermann, Doris Maassen and Stefan Rüping:

Data Mining in Data-Intensive and Cognitively-Complex Settings: Lessons Learned from the Dicode Project

This book reports on cutting-edge research carried out within the context of the EU-funded Dicode project, which aims at facilitating and augmenting collaboration and decision making in data-intensive and cognitively complex settings

(Veröffentlicht in Mastering Data-Intensive Collaboration and Decision Making, Springer 2014, 2014)

Buy Online

Jürgen Bross, Heiko Ehrig:

Automatic Construction of Domain and Aspect Specific Sentiment Lexicons for Customer Review Mining

Automatically analyzing the opinions expressed in customer reviews is of high relevance in many application scenarios , e.g., market research, trend analysis, or reputation management. A great share of current sentiment analysis approaches makes use of special purpose lexicons that provide information about the polarity (e.g., positive or negative) of individual words and phrases. One major challenge is that the actual sentiment polarity of a specific expression is often context dependent (e.g., "long+ battery life" vs. "long- flash recycle time"). However, the vast majority of existing approaches focuses on creating general purpose lexicons. Especially in the context of mining customer review data, the use of such lexicons is rather suboptimal as they fail to adequately reflect the domain specific lexical usage. We propose a novel method that allows to automatically adapt and extend existing lexicons to a specific product domain. We follow a corpus-based approach and exploit the fact that many customer reviews exhibit some form of semi-structure. The method is fully automatic and thus scales well across different product domains. Our experiments show that the extracted lexicons are highly accurate and significantly improve the performance in a sentiment classification scenario.

(Veröffentlicht in Proceedings of CIKM, 2013)

ACM Digital Library

Jürgen Bross, Heiko Ehrig:

Terminology Extraction Approaches for Product Aspect Detection in Customer Reviews

In this paper, we address the problem of identifying relevant product aspects in a collection of online customer reviews. Being able to detect such aspects represents an important subtask of aspect-based review mining systems, which aim at automatically generating structured summaries of customer opinions. We cast the task as a terminology extraction problem and examine the utility of varying term acquisition heuristics, filtering techniques, variant aggregation methods, and relevance measures. We evaluate the different approaches on two distinct datasets (hotel and camera reviews). For the best configuration, we find significant improvements over a state-of-the-art baseline method.

(Veröffentlicht in Proceedings of CoNLL, 2013)

Download (PDF)

Mendes, Pablo N and Jakob, Max and Bizer, Christian

DBpedia: A Multilingual Cross-domain Knowledge Base.

The DBpedia project extracts structured information from Wikipedia editions in 97 different languages and combines this information into a large multi-lingual knowledge base covering many specific domains and general world knowledge. The knowledge base contains textual descriptions (titles and abstracts) of concepts in up to 97 languages. It also contains structured knowledge that has been extracted from the infobox systems of Wikipedias in 15 different languages and is mapped onto a single consistent ontology by a community effort. The knowledge base can be queried using a structured query language and all its data sets are freely available for download. In this paper, we describe the general DBpedia knowledge base and extended data sets that specifically aim at supporting computational linguistics tasks. These task include Entity Linking, Word Sense Disambiguation, Question Answering, Slot Filling and Relationship Extraction. These use cases are outlined, pointing at added value that the structured data of DBpedia provides.

(Veröffentlicht in Proceedings of the LREC, 2012)

LREC 2012

Mendes, Pablo N. and Jakob, Max and Garcia-Silva, Andres and Bizer, Christian

DBpedia spotlight: shedding light on the web of documents

Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.

(Veröffentlicht in Proceedings of the 7th International Conference on Semantic Systems, 2011)

ACM Digital Library

Publikationen zu Alexandria

Matthias Wendt, Martin Gerlach, and Holger Düwiger:

Linguistic Modeling of Linked Open Data for Question Answering

While more and more semantic data is published on the Web, the question of how typical Web users can access this body of knowledge becomes of crucial importance. Therefore there is a growing amount of research on interaction paradigms that allow end users to profit from the expressive power of Semantic Web standards while at the same time hiding the complexity behind an intuitive and easy-to-use interface.

(Veröffentlicht in Workshop co-located with the 9th Extended Semantic Web Conference, Heraklion, Greece, 2012)

Download (PDF)

Publikationen zu MIA

Johannes Kirschnick, Torsten Kilias, Holmer Hemsen, Alexander Löser, Peter Adolphs, Heiko Ehrig, and Holger Düwiger:

A Marketplace for Web Scale Analytics and Text Annotation Services

In this paper, we present MIA, a cloud-based platform and a data marketplace with the goal to enable massive parallel processing of data from the German language Web. End users can describe their analytical task in a structured query language called MIAQL. We describe the functionality of the platform to gather relevant text data, extract information, join, aggregate, group and return results as database tables. In addition to this powerful functionality, MIA offers many cost savings through sharing annotations, text data, built-in analytical functions and third party text mining functions.

(Veröffentlicht in Proceedings of Coling 2014, Dublin, Ireland, 2014)


Steffen Kemmerer, Benjamin Großmann, Christina Müller, Peter Adolphs, Heiko Ehrig:

The Neofonie NERD System at the ERD Challenge 2014

This paper describes Neofonie NERD, our named entity recognition and disambiguation system submitted to the ERD Challenge 2014. The system consists of precomputed lexica and statistics from Wikipedia and Freebase, an efficient spotting component, and a context based disambiguation step. It was originally developed for the German language and has now been adapted for English for the first time. We achieved 70.0% F1-score in the final evaluation, which is 5.7 percent points above the average of all participating teams.

(Veröffentlicht in ACM SIGIR 2014 Workshop, Gold Coast, Australia, 2014)

ACM SIGIR 2014 Workshops