Structure

What is the structure of a database? The logical structure can be compared to spreadsheets, tables, rows, columns, and cells. The physical structure behind the database can be anything the database software developer feels like using.

Logical structure

You can start with a database then drill down to a cell or work the other way up. we will start at the top.

Databases

You can access as many databases as your computer can handle. Every database can have a different structure and unrelated data. Accessing a large number of databases will create duplication of overheads, will chew up memory, and slow down processing.

When you have several brands of database software, you can benefit from one off migrations to a single brand of database. This is a first step in reducing overheads.

When all your databases are the same brand but are still physically separate, some brands of database software can perform complicated accesses across multiple databases, things like joins, but there are still overheads. When you have frequent accesses combining multiple databases, you can benefit from merging databases. If the result is too big for your disk, you have options to place tables on different disks, "sharding", and using disk arrays.

Look at the options in your brand of database software for the available options, discuss them with people who use the same brand to manage big central databases. This is a point where, in the past, you might have changed from MySQL to PostgreSQL. Today most of the big brands have options for clusters and similar big data storage.

Tables

A database table contains a defined number of columns of data with an variable number of rows. Within a row, a column is a cell containing a value of the type defined for that column. You can add columns to a table, delete columns, and with some database software, change the type of data in a column.

Each table has a unique name within a database. A database may have one file containing all tables or a separate file for every table.

Tables can have indexes. Indexes can be based on one or more columns. An index may be in the same file or a separate file. To be a relational table, the table has to have a primary index and every row has to be unique within the primary index.

Columns

A column has one cell, or value, in every row. Some databases let you have empty cells defined as null or some other unique value. Within a table, a column has a unique name. The column might be used in some of the indexes. Values within a column might be unique or not.

As an example, you might set up a table to record car crashes. Every crash has a unique report number. The crash might also have a date but you can have more than one crash on a date. The crash report might also have a time but you could have several crashes at the same time, something that happens in Sydney when there is heavy rain after a couple of weeks of dry weather.

Rows

A row in a table works the same as a row in a spreadsheet. To be relational, the row has to have a unique identification in a primary index. There are no other limitations.

Each cell has to fit the data definition for that column. If the column is a date, the corresponding cell has to be a date. Spreadsheets let you define a default format for a cell but then let you enter data in a different format. Some brands of database also let you enter data that is different from the column definition, a nice option for some types of data but the result is not a relational database.

Cells

A cell is a column within a row. In relational databases, the row has to be unique within the table primary index, making the cell unique. The value in the call does not have to be unique.

If the table has just one column as a primary index, each cell has to have a unique value. Many tables have more than one column in their primary index. With more than one column, there is no requirement of have unique cells, you only need the combinations of columns to be unique.

If we go back to our car crash table, a unique report number can be a single column unique index. What happens when reports come in from different locations and the report number happens to be just a page number in a report book? You can have page 10 arriving from several books. One solution would be to add a unique number to every book and make the book number another column in the primary key.

Values (column types)

One column might be integer while another is decimal or string or a GPS coordinate. The column might allow null values to indicate there is no value supplied in the row. The column may have a length limiting the size of a value. You need to experiment with real values to find out what works.

Think about a dollar amount. You store them using a decimal number set to two decimal places. You then find out taxes have to be calculated to three decimal places.

A good approach is often to start with very wide settings then narrow down the settings after you load real data. You might have a name field set to 20 characters then have to adjust it to 25 then to 30 then to 40. You could start with a length of 100 then trim the size after a few months of entering data. Shrinking the size once would be easier than repeatedly expanding the size plus rerunning data entry processes to add the values that failed on previous attempts.

Physical structure

The physical structure behind the database software is more varied than what you need to store the data. There are other considerations added into physical designs, mostly for performance.

One idea is to never delete a row. You leave the row in place and mark the row as deleted. New rows are always added at the end of a table to reduce processing time. Some databases perform updates by adding the update as a new row and marking the previous row as deleted. This approach requires an index to find the row without having to read the whole table for every access.

Do you have large strings in one column? One brand might store that column as a separate file or store every value as a separate file. This reduces the problem of inserting a big value into a small row during an update.

You can also change things during your design. MySQL has several storage engines. You can mix some of those storage engines at the table level. A long time ago there was a fashion where images were stored in databases as long binary strings. The database developers then had to use a database brand that would store the huge binary strings as separate files. Now we are back to the sensible option of leaving the images as separate files on disk and using the column to store only an identifier for the image file. This approach also lets you add the one file to multiple rows.

Flat file databases often have no index and build indexes in memory the first time they read a table. The index might be stored in a temporary file and rebuild after you restart the database software or database server.

When you have automatic index rebuilding, database backups might leave out the indexes. The indexes are then rebuild after you recover a broken database. Leaving indexes out will make your backup faster and the rebuild slower. Hopefully you backup frequently and rarely have to recover.