Data Deduplication and Single Instance Storage

deduplicationOkay, so this article is for the IT Crowd, or at least those who thought the British comedy was funny.   You see, sales told me that the fact that our system includes automatic and transparent Data Deduplication is so darn dry and boring that they couldn’t imagine including it in a fact sheet anywhere.

Apparently, the feature is so obscure  that even Google Chrome’s spell check doesn’t believe “deduplication” is a word. Maybe that’s why Wikipedia has entries for both Data Deduplication and Single Instance Storage.

With either name, the need for this feature comes about because of the way people work.  In any company, it is very easy to end up with lots of copies of any one file. For example, PMs or Project Assistants just want to attach the files they need to their project on the relevant RFI or Submittal; they certainly do not know or care if the same information is already stored elsewhere for some other project.  These files are often somewhat static: PDF spec sheets and warranties and other documentation from various manufacturers and suppliers that only change a few time a year. Data Deduplication (or Single Instance Storage) is when some underlying intelligence detects the duplicates and deals with them.

Sadly, a simple file sharing scheme such as a Windows Server will store each copy of that file over and over, taking incrementally more and more storage.  Higher end SANs and intelligent systems such as the Spitfire Project Management System will detect when file contents are exact binary duplicates and store the data only once.  You don’t have to do anything special.  Doesn’t matter what location you place the various copies.  The names don’t even have to be the same.  It just works.

This little detail isn’t one that shows up on the PM’s feature requirement list, but I am sure that there are plenty in the IT crowd that are thoroughly relieved to know that their software vendor cares about how much storage gets consumed. Just like we care about bandwidth and CPU utilization and all aspects of total system performance.

I’ll have to check with the sales team–maybe they’ll think “Single Instance Storage” is a more worthy and saleable feature.

2 thoughts on “Data Deduplication and Single Instance Storage

  1. Pingback: Enough Features: Is It Easy Too? | Spitfire Project Management System

  2. Pingback: We’re Paying Attention! | Spitfire Project Management System

Comments are closed.