Data Deduplication and Single Instance Storage

deduplicationOkay, so this article is for the IT Crowd, or at least those who thought the British comedy was funny.   You see, sales told me that the fact that our system includes automatic and transparent Data Deduplication is so darn dry and boring that they couldn’t imagine including it in a fact sheet anywhere.

Apparently, the feature is so obscure  that even Google Chrome’s spell check doesn’t believe “deduplication” is a word.   Maybe that’s why Wikipedia has entries for both Data Deduplication and Single Instance Storage.

With either name, the need for this feature comes about because of the way people work.  In any company, it is very easy to end up with lots of copies of any one file. For example, PMs or Project Assistants just want to attach the files they need to their project on the relevant RFI or Submittal; they certainly do not know or care if the same information is already stored elsewhere for some other project.  These files are often somewhat static: PDF spec sheets and warranties and other documentation from various manufacturers and suppliers that only change a few time a year. Data Deduplicaton (or Single Instance Storage) is when some underlying intelligence detects the duplicates and deals with them.

Sadly, a simple file sharing scheme such as Windows Server 2008 will store each copy of that file over and over, taking incrementally more and more storage.  Higher end SANs and intelligent systems such as the Spitfire Project Management System will detect when file contents are exact binary duplicates and store the data only once.  You don’t have to do anything special.  Doesn’t matter what location you place the various copies.  The names don’t even have to be the same.  It just works.

This little detail isn’t one that shows up on the PM’s feature requirement list, but I am sure that there are plenty in the IT crowd that are thoroughly relieved to know that their software vendor cares about how much storage gets consumed.  Just like we care about bandwidth and CPU utilization and all aspects of total system performance.

I’ll have to check with the sales team — maybe they’ll think “Single Instance Storage” is a more worthy and salable feature.

This entry was posted in Computing, Development, Document Management by Stan York. Bookmark the permalink.

About Stan York

Stan York (VP of Development at Spitfire) has been developing software for decades. Which means he's old enough to remember when punched cards where the state of the art UI. Stan makes no secret about the fact he often prefers data to people, so it should come as no surprise that he is an expert in databases and SQL server. If asked, he might admit he is an MCP, because he knows this is important to some people, particularly at Microsoft. The postings on this site are his own and don’t necessarily represent Spitfire's positions, strategies or opinions.

2 thoughts on “Data Deduplication and Single Instance Storage

  1. Pingback: Enough Features: Is It Easy Too? | Spitfire Project Management System

  2. Pingback: We’re Paying Attention! | Spitfire Project Management System

Leave a Reply

Your email address will not be published.