One common place where people replicate data is in situations where you need data at a specific point in time. A perfect example I saw today was this situation:
You can see that the price column is duplicated in both the products and orderDetails table. The rationale for this is that so that when the price of the product changes that it does not affect historical orders.
This is a fairly viable solution but has a few drawbacks. The main one is data duplication. Most of the time the data will be the same (unless product pricing changes frequently). The second one is that if the product name changes it will affect historical data (to overcome that we could add another column to orderDetails called productName but that is leading to more duplication). The final issue is that if a product is to be remove from the catalog then the order detail would have to be removed as well (or some other ugly alternative).
The simple fix for this is to use the effective record approach. With this approach whenever you change a row you expire the original record and create a new record with the new changes in it. In this example we will change the products table to have an effectiveEndDate column on it and remove price from the orderDetails table:
And here is a sample of changing the products table data:
ProductId Name Price EffectiveEndDate
1 Oranges 1.99 null
so lets say we update Oranges to be $2.35 we would first expire the original and then create a new record with the updated information:
ProductId Name Price EffectiveEndDate
1 Oranges 1.99 3/21/2007 12:00AM
2 Oranges 2.35 null
So now people who purchased oranges before 3/21/2007 will show the price of 1.99 while purchases made after will be linked to the $2.35 price.
Things to Consider
1. Anything that queries a table must factor in wether or not to return expired records.
2. When updating a record you now need to expire the old record and create a new record which is more work and should be done in a transactional context incase of failure.
3. To delete a record simply change the record so that it’s EffectiveEndDate is current.
4. Using nulls in the effectiveEndDate column. This can be tricky as it is not a valid date but the best way to describe an indeterminate expiry. Some people use a high date value which makes writing queries easier for this data but is really a magic number which makes me cringe. One technique I have used is to create a function called MaxDateTime() that returns my magic maxdatetime and then have a query like this:
where isnull(EffectiveEndDate, MaxDateTime()) > getDate()
so if the EffectiveEndDate is null convert it to our magic number (that is only in one place due to our use of a function) and ensure that the expiry is greater than today.
You don’t have to use dates either but I like that I can see when a record was taken out of commission. I bit field called IsExpired could be used if you would like.
Another thing that I have implemented before is an audit trail on records that shows which record was changed. I do this by adding a ParentRecord column to the table e.g.:
ProductId Name Price EffectiveEndDate ParentRecord
1 Oranges 1.99 3/21/2007 12:00AM null
2 Oranges 2.35 4/21/2007 12:00AM 1
3 Tangerines 2.35 null 2
Another use for this that I have done is time sensitive pricing. For this I had to effective columns. The first is the start date and the second is the end date. This allows to have a record be active for a period of time:
ProductId Name Price EffectiveStartDate EffectiveEndDate
1 Oranges 1.99 3/21/2007 12:00AM 4/21/2007 12:00AM
2 Oranges 2.35 4/21/2007 12:00AM 5/21/2007 12:00AM
3 Oranges 2.95 5/21/2007 12:00AM null
In this case the price is effective for a month. The select query (if properly written) will automatically return the proper price for the point in time which the query is run.