Data Science

Motivating Questions: How can we effectively empower decision makers, including USACE as well as other local water infrastructure managers, to find nature based solutions that make the most sense for their local context? How can computational models inform the operation of built and natural infrastructure to increase the resiliency of a water infrastructure network? How can we achieve transparent and efficient decision making via causal learning and transfer learning? Can these models provide accurate operational plans, ultimately providing holistic mitigation actions against droughts and floods?

The ability to leverage data science, engineering, AI, and machine learning to understand and provide resilience to natural and built water systems (NWI and BWI) at different scales is critical in many urgent context, exemplified by the first three PMPs. Data technologies provide unmatched potential to explain humans and to revolutionize our lives by enabling smarter and better informed decisions. While the promise is apparent, the core technologies needed to achieve these in the context of NWI and BWI goals of the project are in the early stages and lack a framework to help realize their potential.

Water systems and simulations generate massive time series data representing complex and dynamic physical processes operating at varying spatial and temporal scales. Natural and built water systems involve heterogeneous multi-modal (temporal, spatial, networked) data and models, 100s of inter-dependent parameters, spanning multiple layers and geo-spatial frames, affected by complex dynamic processes operating at different scales and resolutions. Many of these dynamically evolve over time, due to how the ecosystems develop and due to the preventive and reactive actions taken by individuals and public interventions, requiring continuous adaptation and re-modeling. Scientists, planners ,and decision makers need extensive time and labor to manually sort through these data to understand how buildings are functioning. Both human decision making and model discovery can be significantly improved if this massive and data can be analyzed for key causal features to discover the underlying latent structure and dependencies in the data critical for modeling, optimization, and control. If effectively leveraged, data and physically based models can be collectively used for advanced control, commissioning, or retrofitting existing systems. But, these suffer from two key challenges that reduce their wide-spread usage and prevents their potential impact: Cost of Modeling: The simulation model for a typical NWI and BWI can be complex, and creating a complex model from scratch requires significant manpower. Forecasting Accuracy: due to the complexity, tight coupling, and various temporal scale of the natural and bult water systems operate calibrating a simulation model with the level of granularity for control and fault diagnosis is very challenging. Using such a model for simulation, whose outcomes often highly rely on a specific region’s characteristics, is even more complicated. Cost of Simulations: Given the unpredictability of the natural and human factors and unpredictability of the actions of various independent agencies, decision makers need to generate many thousands of simulations, each with different parameters corresponding to different, but plausible scenarios. Running and interpreting simulation results to generate timely actionable results are computationally costly and, consequently, data and simulations are inherently sparse.

Therefore, tackling the key domain challenges necessitates a novel framework built on computational advances in, big data and model integration, causal learning and discovery, large scale data- and model-driven simulations, emulations, and forecasting, data-driven and model centric operational recommendations, and effective visualization and explanation. This includes

1)    developing spatio-temporal causal discovery algorithms and causally-informed, generalizable data and physically-based models of impact for characterizing complex natural and built systems;

2)    developing a scalable and modular cloud-based platform for data and model integration and complex system simulation and emulation ;

3)    developing spatio-temporal multi-objective and high-dimensional optimization frameworks;

4)    developing methods to apply transfer learning and emulation modeling techniques to simulate and understand processes driving and controlling integrated NWI and BWI systems; and

5)    implementing, testing, and monitoring the performance of the proposed data frameworks within the context of the domain projects.