Carlo Curino – Page 5 – The bane of my existence is doing things I know a computer could do for me.

Music Salt

{youtube}sOMiowrff0Y{/youtube}

“I will survive” @ math department!!

LOL!!!

{youtube}P9dpTTpjymE{/youtube}

PhD Course: “Data Stream Management Systems–Supporting Stream Mining Applications” Prof.

The Dipartimento di Elettronica ed Informazione of Politecnico di Milano organize the PhD level Course: “Data Stream Management Systems–Supporting Stream Mining Applications” by professor Carlo Zaniolo (UCLA).

The course is open to PhD students and research from outside Politecnico di Milano. Please contact me (use the Contact Me link) for further information and to RSVP.

Title: Data Stream Management Systems–Supporting Stream Mining Applications
Teacher: Carlo Zaniolo -University of California at Los Angeles, CS Dept

Periods:
23/6 : 14.30-17.30, Seminar room
24/6 : 14.30-17.30, Seminar room
25/6 : 14.00-18.00, Seminar room
30/6 : 14.30-17.30, Seminar room
1/7 : 14.00-18.00, Seminar room
2/7 : 14.30-17.30, Aula Alfa

Content of the Course:

In the age of the Internet, massive amounts of information are continuously exchanged as data streams that are then processed by on-line applications of increasing complexity. For such advanced applications, a store-now and process-later approach cannot be used because of real time (or quasi real-time) requirements and excessive data rates. Therefore, current research seeks to develop a new generation of information management systems, called Data Stream Management Systems (DSMS), that can support complex applications on massive data streams with Quality of Service (QoS) guarantees. This work has produced novel techniques, research prototypes, startup companies, and the successful deployment of DSMS in many applications, including network traffic analysis, transaction log analysis, intrusion detection, credit-card fraud detection, click stream analysis, and algorithmic trading.

Since many such applications involve both streaming data and stored data, the approach taken by most DSMS consists in expressing continuous queries on data streams using extensions of SQL. But significant changes in the language and its implementation are needed, since DSMS must support persistent queries on ordered streams of transient tuples—instead of the transient queries on unordered sets of persistent tuples of relational DBMS. In particular, only monotonic queries and non-blocking operators can be used. Also, the unbounded streams must be represented by synopses, such as windows containing the most recent tuples in the streams. Thus the semantics of basic operators such as joins and aggregates must be revised for windows. At the implementation level, we have new query optimization techniques that seek to minimize response time and memory utilization. Load shedding techniques based on samples and sketches are used to achieve QoS under overload conditions. The first part of the course, will cover these techniques and the architectures of the main DSMS systems.

The second part of the course will focus on the data stream mining problem that represents a vibrant area of new research. Past work concentrated on devising data mining algorithms that (i) are fast and light enough for on line applications, and (ii) can cope with the concept shifts and drifts that are often present in data streams. However, integrating mining primitives into an SQL-based environment represents a very difficult problem on its own, as demonstrated by the very slow progress made by DBMS on this issue. Thus, we first discuss efficient algorithms proposed for the mining tasks of classification, association, clustering and sequential pattern detection on data streams; then, we explore alternative approaches to integrate them into DSMS.

2008: “Managing the History of Metadata in support for DB Archiving and Schema Evolution”

TO APPEAR “Managing the History of Metadata in support for DB Archiving and Schema Evolution“, Carlo A. Curino, Hyun J. Moon, Carlo Zaniolo, ER Interational Workshop on Evolution and Change in Data Management (ECDM) 2008

Modern information systems, and web information systems
in particular, are faced with frequent database schema changes, which
generate the necessity to manage such evolution and preserve their his-
tory. In this paper, we describe the Panta Rhei Framework designed to
provide powerful tools that: (i) facilitate schema evolution and guide the
Database Administrator in planning and evaluating changes, (ii) support
automatic rewriting of legacy queries against the current schema version,
(iii) enable efficient archiving of the histories of data and metadata, and
(iv) support complex temporal queries over such histories. We then in-
troduce the Historical Metadata Manager (HMM), a tool designed to
facilitate the process of documenting and querying the schema evolution
itself. We use the schema history of the Wikipedia database as a telling
example of the many uses and beneﬁts of HMM.

For more information: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prima

2008 VLDB: “Managing and querying transaction-time databases under schema evolution”

“Managing and querying transaction-time databases under schema evolution” H. J. Moon, C. A. Curino, A. Deutsch, C.-Y. Hou, and C. Zaniolo. Very Large Data Base VLDB, 2008.

The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The ﬁrst is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a uniﬁed representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modiﬁcation Operators (SMOs) to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.

For more information on this project visit: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prima

My Erdos Number: 3

I just discovered that my current Erdos Number is at least (i haven’t checked every path) 3!!

The path is Carlo A. Curino (3) – Alin Deutsch (2) – Victor Vianu (1) – Noga M. Alon (0) – Paul Erdos.

2008 VLDB: “Graceful database schema evolution: the prism workbench”

“Graceful database schema evolution: the prism workbench” Carlo A. Curino, Hyun J. Moon, and Carlo Zaniolo. Very Large Data Base VLDB, 2008 PDF

Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has nearly disappeared. As of today, schema evolution remains an error-prone and time-consuming undertaking, because the DB Administrator (DBA) lacks the methods and tools needed to manage and automate this endeavor by (i) predicting and evaluating the effects of the proposed schema changes, (ii) rewriting queries and applications to operate on the new schema, and (iii) migrating the database.

Our PRISM system takes a big first step toward addressing this pressing need by providing: (i) a language of Schema Modification Operators (SMO) to express concisely complex schema changes, (ii) tools that allow the DBA to evaluate the effects of such changes, (iii) optimized translation of old queries to work on the new schema version, (iv) automatic data migration, and (v) full documentation of intervened changes as needed to support data provenance, database flash back, and historical queries. PRISM solves these problems by integrating recent theoretical advances on mapping composition and invertibility, into a design that also achieves usability and scalability. Wikipedia and its 170+ schema versions provided an invaluable testbed for validating PRISM tools and their ability to support legacy queries.

For more details and an on-line DEMO of the system visit: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prism

Bibtex:

@INPROCEEDINGS{curino-vldb2008a,
author = {Carlo A. Curino and Hyun J. Moon and Carlo Zaniolo},
title = {Graceful database schema evolution: the prism workbench},
booktitle = {Very Large Data Base (VLDB)},
year = {2008}
}

2008: Information Systems Integration and Evolution: Ontologies at Rescue

“Information Systems Integration and Evolution: Ontologies at Rescue”, Carlo A. Curino, Letizia Tanca, Carlo Zaniolo International Workshop on Semantic Technologies for System Maintenance (STSM) 2008

The life of a modern Information System is often char-
acterized by (i) a push toward integration with other sys-
tems, and (ii) the evolution of its data management core
in response to continuously changing application require-
ments. Most of the current proposals dealing with these is-
sues from a database perspective rely on the formal notions
of mapping and query rewriting. This paper presents the
research agenda of ADAM (Advanced Data And Metadata
Manager); by harvesting the recent theoretical advances in
this area into a uniﬁed framework, ADAM seeks to deliver
practical solutions to the problems of automatic schema
mapping and assisted schema evolution. The evolution of
an Information System (IS) reﬂects the changes occurring in
the application reality that the IS is modelling: thus, ADAM
exploits ontologies to capture such changes and provide
traceability and automated documentation for such evolu-
tion. Initial results and immediate beneﬁts of this approach
are presented.

2008 Information Systems Integration and Evolution: Ontologies at Rescue

The life of a modern Information System is often characterized by integration with other systems, and continuous evolution of their data management core, to face unexpected changes in its requirements. Most of the solutions dealing with these issues from a database perspective rely on the notions of mapping and query rewriting. This paper presents the research agenda of ADAM (Advanced Data And Metadata Manager), a uniﬁed framework which, harvesting recent theoretical advances in this area, will deliver practical solutions for automatic schema mapping and graceful schema evolution. Information Systems evolution reﬂects changes in the model of their business reality; in the ADAM project we exploit ontologies to capture such changes in order to provide traceability and documentation of the evolution. Initial results and immediate beneﬁts of this approach are presented.

Paper available at: curino-STSM08-CR.pdf

FlyMake Emacs: continuous compilation

This youtube video shows the possibility of using Emacs with a “continuous” compilation feature turned on… it is horribly slow, since the guys who made it do not talk, but only type 🙁 so feel free to skip all the way to 4′:30” (out of 5′) to see the interesting part. Sounds good to me…

{youtube}F5Cc2W6PbL8{/youtube}