Workshop focus

Data lifecycle management has become an increasingly important and interesting topic in data management. Solutions to adapt to the changing requirements on data access, storage and cost throughout the lifecycle of data are relevant for many areas, including scientific databases, digital libraries, or enterprise data. The Dalí workshop focuses on methods for improving and automating the management of data throughout its complete lifecycle.

A small selection of best papers will be recommended for inclusion in a special issue of Information Systems dedicated to this workshop.

Motivation

As a consequence of the data deluge, both researchers and practitioners now more than ever face the problem of managing data and metadata so as to retrieve information from it under evolving performance and cost constraints. The goal of data lifecycle management is to provide a high point of observation to capture the long-term trends within the management of certain data and thus anticipate and automate in a cost-effective manner important operations, such as data migration, data integration, data archival and provenance tracking.

Let us imagine the data lifecycle of transactional information within an enterprise. The lifecycle begins with the data creation, evolves during various active phases in which the data capture the current state of the world, before it evolves into phases in which data are often aggregated to achieve business intelligence purposes. Then, data are archived in queryable storage for long-term trends analysis, and are finally destroyed or archived off-line for accountability obligations. Throughout the data lifespan, the control of data privacy and security, data quality and curation, scalability and high availability, integration with other data sources, migration across multiple systems, and many more issues contribute to the many facets of this problem.

Corporate enterprises, web companies, and large scientific projects developed strategies, systems, tools, best-practices, and guidelines to tackle each of the above tasks. While many of these problems can be and have been tackled individually, a global view of the problem is required to achieve optimality, and to guarantee global properties. Examples of such efforts are the use of data provenance in scientific workflows aiming at managing data flows in their completeness, the introduction of service oriented architectures in enterprises, and, with the success of mashups, the study of novel solutions to make the problem of correlating different modules easier. Recent research in information preservation, data and workflow provenance, and data curation has already broaden the scope trying to capture complex data manipulation processes, but more work is needed.

Topics of Interest

Dalí focuses on discussing various issues arising when looking at data lifecycle in its integrity and automating, accelerating, and securing the process. Interesting topics include:

  • Data lineage / provenance in workflows
  • Data maintenance / curation
  • Security and privacy preservation in data lifecycle
  • Data integration (schema mapping, entity resolution, data fusion, mashups)
  • Schema and data evolution
  • Information preservation
  • Data retirement (data archival / destruction)
  • Data migration / data exchange
  • Master Data Management
  • Physical data placement to meet performance and high availability requirements
  • Monitoring and auditing in data management
  • Data lifecycle management on novel data management architectures (cloud, streaming data, ...)
  • Applications of data lifecycle management in specific domains (scientific data, multimedia data, ...)

Why Dali?

Salvador Dali gave direct contributions in all the aspects of art: painting, sculpture, theatre, fashion, photography, and so on.