How to properly model a table that persists another table’s columns information?

Problem

I'm trying to define a database model that:

  • Persists simulations results, which are represented as values of different variables for a time instant.
  • Persists information about the variables such as their description, unit, category or more.

CSV World

If stored in CSVs, this would mean that I can have one CSV to represent the variables' information as in the following example:

Name | Description     | Unit         | Category
V1   | <description 1> | $/1000       | Economy
V2   | <description 2> | persons      | Demography
V3   | <description 3> | Boolean      | Index
...

And one CSV per simulation as the following:

Year | V1     | V2  | V3
2030 | 142.43 | 5   | False
2031 | 143.33 | 5   | True
2032 | 144.99 | 6   | True
...

In the "CSV world", if I run n simulations I would then end up with 1 CSV of the first type and n CSVs of the second resulting in n+1 CSVs.

Relational Database World

When trying to adapt this problem to the "relational database world", I couldn't find a "perfect" solution. These are the ones that I came up with.

"Impossible" solution with dynamic typing

Because relational database columns have a static type (as far as I know), I can't store floats, booleans and integers in the same column. That means that the following model is not possible.

dynamic typing example

This would seem the most "theoretically pure" solution (at least to me) if we ignore the fact that column Result.value is storing Floats, Integers and Booleans. Nevertheless, we can't ignore that fact so this is not really a solution.

Using the column name as an "implicit FK"

Another alternative is to have one column per variable in the table "Result" and "implicitly" link the column name to a row in the "Variable" table. That is, the DBMS would have no knowledge of the connection between these 2 tables and it would have to be "hardcoded" in the code that uses them (Java, PHP, Python, etc), which seems like a code smell to me. Nevertheless this seems like the best option of the 3. This would result in the following model.

implicit FK example

As you can see, the "Variable" table is "isolated".

One table per variable result

The problem seems to be that we need to model the "variable result" on its own. Therefore, why don't we just create a table for each variable result? This would result in the following model (for clarity I didn't include the links between the rows in the results tables and their respective row in the "Variable" table. You'll have to use your imagination).

one table per variable result example

I don't like it. It seems overkill for the following reasons:

  • There would be an explosion of tables, one per variable. If we have n variables, then we would have n*years relationships per simulation and "aggregating" the results would probably end up being too cumbersome.
  • The column "V_ID" would be the same for each row in each variable result table. This redundancy just tastes icky.

Conclusion

This seems like a "simple" problem but I can't find a "good solution" by myself or googling for it. Am I overthinking it and I should just stick with the "column name as implicit FK" solution (the second one)?



Read more here: https://stackoverflow.com/questions/64398527/how-to-properly-model-a-table-that-persists-another-tables-columns-information

Content Attribution

This content was originally published by Alechan at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: