I'm trying to define a database model that:
- Persists simulations results, which are represented as values of different variables for a time instant.
- Persists information about the variables such as their description, unit, category or more.
If stored in CSVs, this would mean that I can have one CSV to represent the variables' information as in the following example:
Name | Description | Unit | Category V1 | <description 1> | $/1000 | Economy V2 | <description 2> | persons | Demography V3 | <description 3> | Boolean | Index ...
And one CSV per simulation as the following:
Year | V1 | V2 | V3 2030 | 142.43 | 5 | False 2031 | 143.33 | 5 | True 2032 | 144.99 | 6 | True ...
In the "CSV world", if I run
n simulations I would then end up with
1 CSV of the first type and
n CSVs of the second resulting in
Relational Database World
When trying to adapt this problem to the "relational database world", I couldn't find a "perfect" solution. These are the ones that I came up with.
"Impossible" solution with dynamic typing
Because relational database columns have a static type (as far as I know), I can't store floats, booleans and integers in the same column. That means that the following model is not possible.
This would seem the most "theoretically pure" solution (at least to me) if we ignore the fact that column
Result.value is storing Floats, Integers and Booleans. Nevertheless, we can't ignore that fact so this is not really a solution.
Using the column name as an "implicit FK"
Another alternative is to have one column per variable in the table "Result" and "implicitly" link the column name to a row in the "Variable" table. That is, the DBMS would have no knowledge of the connection between these 2 tables and it would have to be "hardcoded" in the code that uses them (Java, PHP, Python, etc), which seems like a code smell to me. Nevertheless this seems like the best option of the 3. This would result in the following model.
As you can see, the "Variable" table is "isolated".
One table per variable result
The problem seems to be that we need to model the "variable result" on its own. Therefore, why don't we just create a table for each variable result? This would result in the following model (for clarity I didn't include the links between the rows in the results tables and their respective row in the "Variable" table. You'll have to use your imagination).
I don't like it. It seems overkill for the following reasons:
- There would be an explosion of tables, one per variable. If we have
nvariables, then we would have
n*yearsrelationships per simulation and "aggregating" the results would probably end up being too cumbersome.
- The column "V_ID" would be the same for each row in each variable result table. This redundancy just tastes icky.
This seems like a "simple" problem but I can't find a "good solution" by myself or googling for it. Am I overthinking it and I should just stick with the "column name as implicit FK" solution (the second one)?