Poster Title:  Building data stream summaries
Poster Abstract: 
Many real-world applications currently rely on the analysis of high-speed data streams. Traditional data mining techniques may fall short in this scenario, as data streams impose rigid processing constraints and are constantly changing. Data stream mining’s aim is to learn from streams in a real-time manner. Previous contributions to the field have studied ways of constructing compact representations of passing data, as well as the extension of supervised and unsupervised learning techniques to the data stream setting. However, many of these summaries are not general-purpose and sometimes require strong assumptions about the nature of the data.  
This thesis' focus is thus to find a new approach for the unsupervised analysis of data streams. Namely, our goal is to build (under time and memory use constraints) a quality and compact summary of the underlying data distribution and its evolution overtime. Our main lead is that this could be achieved by adapting the MODL co-clustering, an information-theoretic approach for statistical inference, to the stream problem.
Poster ID:  D-19
Poster File:  PDF document poster_stream_summaries_D-19.pdf
Poster Image: 
Poster URL: