Cells
Cells in databases store values. Each cell stores one value. A cell in a database table is identified by the row and column. When you have multiple tables, you need the table id, row id, and column. For multiple databases, you need database id, table id, row id, and column.
I will refer to database, table, and column id/names as names and row identifiers as the row ID. Most database software follows this pattern as row identifiers can be complex, way beyond names.
Two databases can be in different directories and have the same name. Give every database a unique name.
Within a database, tables have to have unique names but the same table name can be used in a different database.
A table abc in database xyz might be referred to as table xyz.abc.
A column has to have a unique name in a table but the same column name can be used in other tables.
Assume we are looking for column col in table xyz.
We could use SQL similar to the following incomplete example.
select col from xyz
Think about a table containing a list of shoes for a store. The table might have columns named brand, model, size, colour, and quantity.
The unique identifier of a row in the table might be the combination of brand, model, size, and colour. The four columns might be used in a primary key.
The primary key might be named product. You select a row by supplying the four column values as shown in the following incomplete example.
select quantity from xyz where brand = 'roadrunner', model = 'superdeluxe', size = '15', colour = 'burnt orange'
Values
What sort of values can be stored in cells. In our previous example, size might contain a string because some sizes are 10AAA.
Another shop might never sell widths and store everything as integers, 10, or decimals to allow for sizes like 10.5.
That model column might be limited to 10 characters while the colour might allow 20 characters. Why limit the size of a string? In some databases, fixed length strings are faster than variable length strings.
If there is a description column, the description string might be unlimited in your logical design. The physical design might have a length limit of 255 bytes because the length field can then be restricted to 8 bits. The occasional longer description field would require a 16 bit length, giving you a length limit of 65,000 bytes. In a good database, there is no logical difference.
Down at the physical level, the database might change the way the data is stored and that may reduce performance. Your logical design might limit the description to 300 characters because that looks good on the product display page. The database administrator might ask you to reduce the description field from 300 bytes to 255 bytes.
PostgreSQL can store the following data types.
- Boolean
- Strings including char, varchar, and text
- Integer
- Floating-point
- Date, time, timestamp, and interval
- UUID (Universally Unique Identifier)
- Array for a list of strings, integers, or anything else
- JSON for Javascript format objects
- key-value pairs
- Network address
- Geometric data
Those specialised data formats only work if all the incoming data is an exact fit. The instant you have something slightly different, you have to switch to a less restrictive type of value. If your UUID data turns out to be non standard, you might need a generic string column.
Moving to a different database?
Take the example of a developer using PostgreSQL and all those specialised column types. No merge that company/developer/database into a bigger company where they refuse to switch to PostgreSQL. You have to extract all the data from PostgreSQL, convert some columns to another type, then import the data into the other brand of database. At this point you need to understand the data, both the existing type and the new type. The new type might allow values that do not work.
When you put data in the old type, you might be allowed to use null for missing values.
The new database might not let you use null so you decide to use zero.
After wasting days importing a hundred million rows, you discover zero is a valid value that has to be separate from null.
Some database designers refuse to use null and have a separate column containing a boolean column named something like value_missing.
I tend to use null when available and document the usage carefully so that the logic is understood when the data is exported.
Performance
Some types of values save disk space but slow down access. Some types of data make searches slow. Take the example of a long test field. One database might refuse to build an index for the text field. Another database might let you build an index for the first few characters. You might choose to use the first 20 characters.
In situations like these, you might increase performance by building a keyword index. There are many other tricks you could use. The important thing is to study the options in your brand of database. The logical design should not change just for minor performance improvements. You only change the logical design when there is a massive performance difference between two approaches. The previous example was reducing a description field from 300 characters to 255 characters for cases where it makes a big difference.
Good design
Good design saves the user time. Forget the computer. Forget memory usage. You design to help the user do what they want to do in the absolute minimum time. Good designs are only altered in the few cases where the organisation cannot cope with the changes required to provide instant response in your design.
On one project I produced a beautiful fast responsive system that could easily be expanded to handle all their data without significant overheads. They decided to keep some of their data in a slow old Java based system. Part of their "logic" was the recent purchase of a new server. I suggested a server that would be many times faster for only double the cost. They purchase a server that was only double the speed and many times the cost. Moving that data out of the Java system into the modern system would mean upgrading the server which, in turn, would mean admitting a mistake. They continued with the Java bottleneck. The new system had to downgrade data and processing to something that the Java system could handle.
Politics can interfere with the best design. Good design requires a way to bypass politics and go straight to the users to find out exactly what they do and what they really need.