Journal of Big Data is launching a special Collection entitled, 'Emergent architectures and technologies for big data management and analysis,’ with papers accepted from the Big Data in Emergent Distributed Environments (BiDEDE 2024) workshop. BiDEDE 2024 is in conjunction with the 2024 ACM SIGMOD Conference (https://www.ifis.uni-luebeck.de/~groppe/bidede/2024).
In recent years, new forms of distributed environments beyond cloud computing have occurred, which offer new kinds of applications but pose new challenges for data management. The recent efforts for serverless computing aim at simplifying the process of code deployment in cloud-based production systems, by hiding from the developer/administrator scaling, capacity planning, and maintenance. Other works focus on minimizing network traffic to the cloud by deploying and running environments for data processing (1) near data sources in Internet-of-Things scenarios (e.g., fog and edge computing) and (2) near applications (e.g., cloudlets for mobile applications and offline first technologies for web applications).
Research on distributed data management evolves addressing new challenges specific to these new environments (called emergent), going beyond traditional cloud computing. Properties of emergent distributed environments are characterized by: the computation and memory capabilities of nodes, communication bandwidth, battery lifetime of nodes, reliability of nodes, data types that nodes produce and store, as well as the functionality of nodes, e.g., fault tolerance, replication capabilities, resource provisioning, buffer management, query processing and optimization, transaction management, safety, and security.
Furthermore, approaches to integrating distributed and highly heterogeneous data (like federated architectures based on the principles of data mesh/data fabric, polystores, data lakes, and data lakehouses) span over several emergent distributed environments. They also are sources of research challenges based, on the need for combining these different distributed environments into one distributed runtime environment for easy handling of big data in different models, and for globally optimizing data management tasks across these different environments.
In addition, current trends in research and technology, apply various artificial intelligence techniques and tools to manage such complex distributed and heterogeneous systems. These works apply machine learning (ML) to build query optimizers, performance models of larger systems, or even support designing and maintaining data integration processes. Complex ML models are built to automatically tune systems (like the so-called self-driving database management systems) and manage complex federated architectures.
This Collection aims at covering the advancements of the aforementioned technologies from the research and application perspectives.