This paper is related to a project aiming at discovering weak signals from different streams of information, possibly sent by whistleblowers in a platform as GlobalLeaks. The study presented in this paper tackles the particular problem of clustering topics at multi-levels from multiple documents, and then extracting meaningful descriptors, such as weighted lists of words for document
representations in a multi-dimensions space. In this context, we present a novel idea which combines Latent Dirichlet Allocation and Word2Vec (providing a consistency metric regarding the partitioned topics) as potential method for limiting the “a priori” number of cluster k usually needed in classical partitioning approaches. We proposed 2 implementations of this idea, respectively able to : (1) finding the best k for LDA in terms of topic consistency ; (2) gathering the optimal clusters from different levels of clustering. We also proposed a non-traditional visualization approach based on a multi-agents system which combines both dimension reduction and interactivity.
The Open Data allows the access to plentiful data, with a large coverage, but none of them offers a structured databased around news. Through DataNews, our goal is to seek for data automatically so as to provide means to reuse them. To do so, we first defined an event typology in the specific context of death in AFP wires. Then, by restraining ourselves to the natural disasters, we clustered these wires by events so as to identify them. The goal of the last step is to build extraction patterns so as to collect values corresponding to the death number, as well as the context associated to these values. The results of our evaluations reassured ourselves in the large potential of our method that could lead to several applications.
This paper focuses on influencers, defined as individuals succeeding to have an impact on the decision process of other individuals simply through interaction. The success of social networks in the last decade led to an increasing interest for detecting such profiles. In such a context, we present a new influencer model based on the observation of real influence processes. We first define the theoretical frame in which we model the influence process. Then, we describe our empirical approach, based on the observation of influencers in forum discussions, allowing us to characterise each of our model component with linguistic features. Finally, we conclude by presenting, as a perspective, the model implementation with the linguistic feature annotation organised to acquire gold data.