Database

You want to store data. A database is an excellent choice and there are other options. You need to know a little bit about all the options before choosing a database then you need some information about databases to choose the right database option. Here is everything I found from using all the options.

You need to answer all the who, what, when, where, why, way (how), and worth (cost) questions before choosing your data storage.

Who

Who will supply the data? Who will use the data? When you store data in a file, you can control who can create and access the file but not the data within the file. Replace the file with a database. You can split the data into tables and columns and rows. You control who can create and access each table. With some databases, you can control who can access each column. Your PHP code can control who accesses each row. You have complete control over the "who".

What

Files and similar data storage require a format to fit your data. There are many formats. What you store determines the file format. The file is then stuck with that format for all data. A database gives you the flexibility of storing data in different formats in each column. A database solves the "what" problem for small to medium chunks of data.

When you have large chunks of data, you will store each chunk in a separate file. Photographic images are an example. Each image can be JPEG or RAW, depending on the camera and processing. The problem with a file store is finding the right file without viewing every file.

Enter the database. Create a database to list all the files. You can then add descriptions, keywords for searches, lists of the people in the images, locations, and creation dates. The database becomes the index for the files. The database solves the "what" problem for large chunks of data.

When

You create a program to read data, perform a conversion on the data, then pass the converted data on to someone else. Your data storage only has to be temporary. You do not need a database. Right?

You could be wrong. Temporary data storage disappears when you restart a server. You could lose the data you are processing when the server stops. Data stored temporarily in a database will survive a server restart.

A database also solves the problem of a network delay. When you finish processing the data, you want to send the data through a network but the network might be broken. A database can store the processed data until the data the network is fixed.

A database can solve many timing issues and can record the status of transactions to create a log you can audit.

Where

Where is your data used? Frequent use of data from several locations is a problem you need to solve before you are hit by network delays and massive network charges. Databases can be distributed to solve "where" problems.

Images used in Web pages can be distributed using a CDN, a Content Delivery Network. CDNs work for image files and similar files. CDNs do not work for frequently updated data or data that needs processing before presentation.

A database can be replicated across many locations, giving you data where your customers need it. In a simple replication system, there is a master database you update and several slave databases that receive copies of every change from the master. Think of it as Write Once, Read Anywhere.

With some more software and configuration, you can have multiple masters sharing updates. This is WARA, Write Anywhere, Read Anywhere. Your customers can update their data locally and the updates will be shared around the world.

Why

Why are you going to do what you are planning? Will the reasons change? Databases add the flexibility to make changes easily. Databases are not the best choice for every change. Look at the reasons for choosing your data storage.

Your project specifies local storage of data for security. You ask about backup and the customer says a daily copy offsite is all they need. You then look further into the project and see payment transactions mentioned in phase two of the project. A daily backup will not protect the data transaction by transaction. After a data recovery, the system might be missing thousands of transactions for millions of dollars.

Why was a daily backup chosen instead of a database replication or some other protection at the transaction level? There will be other questions about why someone chose a file instead of a database or temporary storage instead of permanent storage.

Always write down the questions. When you have an answer, record the answer and who made the decision. You can return to the questions in later phases of a project.

Storing something permanently then throwing it away later is safer than using purely temporary storage. Saving temporary data also helps you with testing. When you find some output is wrong, you can look the temporary data to find out where the data is corrupted.

Way (how)

During development, you will run into many questions about the way you store and access data. Everyone will ask, or should ask, how they can access the data. After choosing to use a database, there are still choices of database type and database column types. Replication has many options. I will add some pages on each choice to help you decide.

PHP can handle any type of database plus many alternatives to databases. There are few decisions at the code level when you change to a different type of database. Some databases automatically increment integer primary indexes while others require special code. While there are different specialised data types across databases, there are usually common data types that work in every database.

When you choose MySQL for your database type, you then choose a database storage engine because MySQL has several. In MySQL, one of the most popular storage engines is InnoDB. InnoDB lets you create a database with everything in one file or every database table in a separate file, another "how" decision to make.

Down at the individual data level, you can store strings of data in several different column types. The column types set the maximum length of the string they can store. Databases and database storage engines vary in how they store large strings, which may limit the data you can store in a table or may reduce performance.

Worth (cost)

The MySQL/MariaDB pair are the default choices for databases in Web sites. Microsoft's SQL Server and Oracle's Oracle RDBMS are corporate choices you may be forced to use on some projects. Huge Web sites may choose the MongoDB with or without an SQL processor.

Your choice of brand will lock in the purchase cost, the development cost, the ongoing maintenance cost, and upgrade costs. You need to look at all the costs over a 3 ~ 5 year period.

As an example of unexpected cost, one project started with an Oracle development license at a very low cost. Right at the end of the development, the money ran out, the product was still a long way from generating a profit, and Oracle sent a huge bill for the commercial licence need to put the database into productive use. Someone forgot the big fee from Oracle or they forgot the fee would be huge for the large server needed to run the application.

MySQL

MySQL was the default choice for Web site projects. MySQL is now often replaced by MariaDB. Both MySQL and MariaDB are open source which means you can database specialists can read the source code to diagnose problems. Anyone can run as many copies of MySQL as they like, performing as much testing as they want.

MySQL was sold to Oracle then the team that developed MySQL started again with MariaDB. There is one version of MariaDB using the same open source code as MySQL and a later version with improvements. Oracle charge a lot of money for an "enterprise" version of MySQL while the MariaDB people offer a similar enhancement as an open source project.

You can easily switch from MySQL to MariaDB. When you start using the extra features of MariaDB, you will not switch back.

Now you have to choose which storage engine you will use inside of MySQL/MariaDB. Both offer multiple storage engines with different performance profiles. That is a topic for the future.

Oracle

Oracle is the most expensive database software. You buy Oracle for the brand, the same as buying an iWatch. If you followed the Apple watch saga, you would know that versions 1 and 2 of the Apple watch were an expensive waste of time compared to almost every other watch on the market. The Apple watch version 3 has, for the first time, added something not available in many other watches. Oracle is the same.

Oracle started as an alternative to databases from IBM. Years ago Oracle had a slight lead but PostgreSQL removed the need for Oracle in small to medium projects. Oracle occasionally released new products for very large projects but is now replaced by NoSQL projects.

I had to use Oracle on one project because they customer advertised a partnership with Oracle. That release of Oracle had many problems. The project was delayed by at least three months due to restrictions in that version of the Oracle database. In a comparison with MySQL or PostgreSQL, the project would work with two experienced developers for three months using MySQL, two experienced developers for four months using PostgreSQL, and three experienced developers for five months using Oracle. You would not have the same problems with a recent release of Oracle.

Oracle required a full time expert to handle all the weirdness in Oracle plus the performance problems. PostgreSQL takes more time to set up than MySQL but would have performed better than MySQL or Oracle on that project due the better choices available.

SQL Server

SQL Server is Microsoft's brand of database and is commonly called MSSQL by developers. MSSQL is popular because it is cheaper than Oracle. The corporate world often looks at Oracle first then chooses MSSQL because they have Microsoft software already installed, making MSSQL the lowest cost compatible option for them.

MSSQL is easy to adopt when you are using Windows in your development environment. You rarely have problems at the start because the software works straight out of the box. When you do run into difficult problems, you cannot look at the source code. Microsoft certified experts are supposed to help but most of them have the same problem, they cannot look at the source code.

MongoDB

MongoDB is a document storage system with no direct SQL processor. You manually perform the SQL work in your code or you add a front end processor to give your SQL style access. MongoDB requires far more setup work and may perform only a little bit faster on a medium size project. On a very large specialised project, MongoDB would be equivalent to using something like PostgreSQL with intensive data partitioning, with both requiring extensive configuration work to fit the most common types of access.

Back before the invention of SQL, most databases were similar to MongoDB. IBM sold IMS for essentially the same use as MongoDB. SQL was invented to make access easier and compatible across brands of database. Oracle exists because IBM announced DB2 with SQL access and Oracle could sell their product as an alternative.

MongoDB could also be compared to Apache Solr if you have used Solr. You might remember the problems with manually defining indexes in Solr. MongoDB requires similar work.

Read more