Many web developers also design their own databases. I don't know how many times I've heard something like "I'm a designer...I don't know how to add a user" or "I'm only the web site administrator...I don't know how to write SQL." There are a few simple things that someone designing a database should know. I'll try to cover them for the developer/designer who also is in charge of handling the database implementation.
- Database tables handle one "thing", and that thing is a list. The list might be "products", "orders", "customers", or "blog items". In all cases, the database table should have a plural name and should only contain items that pertain to that particular thing. You don't want to start adding in categories to a products table, or adding country information to a customers table. Keep the different items separated into different tables, and use foreign keys and joins to relate the data.
- Database fields should contain one thing that pertains to the table, and one thing only. A database field should never hold more than one thing. You shouldn't list categories in a field like "12,13,22" or put a name in a field like "Jack Schplatzen". This will only serve to create complexities later. Even if you think your web application will never need to separate first and last names, it probably will at some point and you'll be sorry you didn't separate them from the beginning.
- Use the appropriate data type for a field. This might sound like it is self-explanatory, but it isn't. Many people insist on storing phone numbers as numbers, or zip codes as numbers, or order counts as text, or yes/no questions as text, or monetary values as doubles/floats. There are many rules to consider, but the following should be the most basic -- numeric fields are meant to be calculated. It's not necessary or wise to use numeric fields for zip codes, phone numbers, or any other type of data that is not meant to be calculated. As soon as you try to store a postal code as a number, someone will type in W4Z-182 or 22031-0241. A yes/no type question has only two possible answers, so should go into a boolean or bit field. If a numeric (such as a category id) is stored as text, joining the data will be slower and sorting will be inaccurate. Finally, anything related to money should be kept in the appropriate monetary field for your database. There are rounding errors when using double or float data types and doing any kind of calculations that would not look good in your accounting. Have you ever seen an order that adds up to $29.850000000002? It will if you use the wrong data type.
- Use the smallest data type available for the type of data you intend to store. Some databases give you different lengths of fields for integers and strings, for example. In MySQL, you have a bigint, smallint, tinyint, and other types. If a category table will only contain 16 items throughout the life of the site, there is no reason to take up extra storage and sacrifice speed by choosing a datatype that allows 4294967295 entries. A char field will always take up space even if you don't use it. A varchar field takes up only the amount of space you need. Know the data you want to store and the data types available to you.
- Make all table names and field names user friendly. How often have you seen table names like "blg_Itms_Rel_Cat" or field names like "sItm" or "blg_Itms_Rel_Cat_ID". It sounds great in theory to create names that are short, but when you have to start typing code or queries, it takes more time to read and type abbreviations than it does to type out the whole words. Tables should be user-friendly, like blog_items_categories_related, and field names should be as well. To shorten your SQL when writing queries, you should be using table aliases anyway, so saving a few characters doesn't help anyone. My article on Writing Readable SQL should help in this area. Also, while on the subject of naming, choose a method and stick to it. If you type in camel-casing (BlogItemsCategoriesRelated) or use underscores, be consistant. Some languages you'll use are case-sensitive, so if you can't remember that a field was named blog_id or Blog_Id or blog_ID, you have only yourself to blame.
- Plan your user access in advance and stick to the plan. The admin user who creates the tables should not be the same user that is used for the web access. Create at least one dedicated user for web access. This user should not have access to other databases either. In PHP, I see people using the root user in their web applications. Similarly, in CF or ASP I see people using the sa user. It's only a matter of time before security is breached and something is destroyed. There are tools available for every database that help with the process.
- If you have stored procedures at your disposal, write stored procedures to handle the web access. Plan to keep the tables restricted to administrators. This increases the security of your database and prevents people from willy-nilly changing things in the data. If the web developer does not have access to the table, he can't mess it up. If the hacker doesn't have access, he can't destroy your data. In databases like SQL Server, this is very easy to do. It's tempting to write a query in a page because it's easy, but if you have all table access turned off, you'll be less tempted to do it. CMX has many articles dealing with writing stored procedures. Heidi's article on SQL injection attacks shows why you should not leave table access open.
- Use indexes. This is key to having a fast, responsive web site. Indexes are almost always assigned to the primary key by default, but your tables will need other indexes. For example, a customer_id field in a customers table will be a primary key and have an index. However, the customer_id field in the orders table might not -- but it should. It will speed up access to the order when searching on customer, speed up access to the order when joining to the customers table to get customer_name, and also speed up access to orders when ordering by customer. My article on indexing a database table should help decide when and where to put indexes.
- Use many-to-many tables whereever you have data that requires multiple entries. It sometimes sounds easier to add fields like "category1" and "category2" to tie a category_id from a category table to a products table, but that ties your design to always having two categories. What will you do to add a third category? Add a "category3" field? What about 15 categories? Another mistake is to create a field that holds a list of categories, like "15,16,22". This is also a bad idea for many reasons. A better alternative is to create a third table called products_categories that contains only category_id and product_id and either learn how to join the tables using SQL or use your database tools to visually create the joins and the SQL.
- Use foreign key constraints. Typically when you have a database, you have many tables and some will relate to others. Databases have rules for how data is updated and deleted. If some of your data depends on data in other tables, it should have constraints on those tables. For example, if an order_details table relates to an orders table, you don't want to delete a row in the orders table without deleting the corresponding rows in the order_details table. Whereever possible, add constraints and allow the cascade delete of the data. This sometimes makes the web application harder to create if the constraint prevents you from adding/deleting data from a table, but in the long run you are better off learning the rules and applying them to related tables.
Conclusion
This article has attempted to show some of the top things to consider when creating your database for the web. Be sure to check out Joel's article on database design for a more meaty introduction showing some tables and data, Rob's article on database planning for a practical approach to a given problem, or Adrian's various articles on databases to expand on the knowledge and apply it to your web pages. We also have many articles dealing with databases and SQL.