Using Dictionaries to Manage Data Within a Modeling Framework System |
|
Title Page Legal Notice Summary Table of Contents Acknowledgments Abbreviations and Acronyms Glossary Introduction Background Understanding Dictionaries Creating Dictionaries References Further Reading Appendix |
BackgroundIn the past, the traditional approach for managing data within modeling systems was to directly connect specific models (i.e., hardwire the models to each other). Each connection specifically reflected the data needs of the consuming model, resulting in an efficient transfer of data and dynamic feedback between models. Figure 1 illustrates this traditional approach for linking models and managing data. Figure 1. Traditional Approach to Managing Data in Modeling Systems Unfortunately, as the complexity of the modeled problem increases, so does the complexity of the data management. Thus, this approach quickly becomes unmanageable. It also prevents the user from adding new models, parameters, data requirements, databases, or other components without having to modify the entire system and revamp older models. As modeling systems evolved, some developers took advantage of advances in "object-oriented" modeling to allow models entering the system to agree on a data transfer protocol (Figure 2). This nontraditional approach identified system data specifications to which models would have to adhere when passing information between model types and databases. Pre- and post-processors allowed older models to remain unaffected and facilitated the ability to connect these models directly into the system. The ability enhanced quality control by making the entire legacy of testing and validation for each model and database still appropriate (i.e., the original code was not modified). The ability also simplified management of and modification to multiple models (Whelan et al. 2001. PNNL-13453). Figure 2. Nontraditional Approach to Managing Data in Modeling Systems Unfortunately, ensuring that all model developers followed a rigid set of data specifications, while attractive in theory, proved difficult in practice. Following a file specification required each developer to incorporate both format and data content, which had the effect of having every developer recode the specification. This approach, while effective, increased code maintenance and made the interpretation of the file specification critical. Many problems can arise when different individuals make slightly different interpretations of the specification. In addition, many modeling frameworks required that the specification be built on the needs of upstream models (that is, models that run earlier in the analysis process). These upstream models might produce three output files, while only two would be needed by models running later (downstream models). For example, a chemical database might provide information on 17 different constituents, but the health effects model that will run later in the analysis only needs information on the three constituents that could be linked to cancer. This specification approach, then, results in an excess of data moving through the system, slowing processing and analysis time. FRAMES Version 2.0 provides the flexibility to allow model developers to use standard requirements (i.e., DICs) to minimize the data produced and consumed between models. If a model developer wants his/her model to produce additional data that are not consumed by a specific downstream model, that can be accomplished. Another, yet different, model might require these data in future assessments. By giving model developers this flexibility to produce a wide range of output datasets, models can link to and communicate with a larger set of other models. Models may also be able to be applied in a wider set of scenarios. FRAMES 2.0 also utilizes an Application Programming Interface (API) to manage data within the system. This API coordinates and manages the input and output between components. Basically, in FRAMES, "what" is being stored is different from "how" it is stored. Thus, the API can provide the following functions:
The key to successful data management within a modeling system, then, is to ensure that the producing component provides information that meets the needs of the consuming component, in a form that is recognizable by both components as well as to provide a mechanism for ensuring the compatibility of that form. Figure 3 illustrates this approach to managing data within a modeling system, with the arrows representing the transfer of data through one or many DICs. Figure 3. Using Dictionaries (indicated by arrows) to Manage Data in Modeling Systems In FRAMES, this shared responsibility for managing data is based on datasets, whose describing information is characterized in DIC files. These files can be created by hand or through the use of the API. Figure 4 provides a more detailed example of the use of DICs by FRAMES. The model shown in the figure can accept information from three different source types: user-defined input provided through a user interface, data supplied from a database, and data provided by a model that runs earlier in the analysis process. When the output dataset (i.e., Model DICs as Output in Figure 4) from an upstreammodel provides enough information to satisfy the input required by a downstream model (e.g., Model 1 in Figure 4), then the data from theupstream model can be successfully and seamlessly transferred to thedownstream model. Figure 4. How Dictionaries Manage Datasets The following sections provide additional information on DIC files, both to understand and to create them. |
|