There are many enterprise storage discussions today focused on moving data to public clouds for archival as the cost of entry is modest, especially when immediate capacity is required.  But, once you make that commitment to a public cloud, there may be scenarios that require the data to be moved back on-premises in a reverse cloud storage strategy.

This may include a need for IT to control some critical data (such as transactional, inventory or proprietary), meet certain regulations (such as GDPR), comply with specific standards, respond to audits, perform analytics, or simply for financial reasons. 

Though public cloud storage is agile, scalable and flexible, it is not for every business requirement.  For example, to remove data from a public cloud is typically expensive and difficult to budget for because at times, you won’t know what data needs to be removed, when or how much.  There may be circumstances when a business that stores data in a public cloud will need to move it on-premises.

Cloud data removal

It’s easy to fall in love with a public cloud from a cost-of-entry perspective, and new college grads are also comfortable coding for the cloud as evident by the tons of content that already resides there.

The cloud, for some, is a way to avoid entanglements associated with central data processing and its processes as it enables faster data movement than traversing through an IT environment.  For most, a public cloud is a repository of colder data that needs to be archived and protected through back-ups and safeguards as part of the pricing model.

The shock with public clouds occur when data needs to be removed.  In a typical example, the cost to download, whether it’s a big data analytics workload or legal document, is many times the cost of keeping it in the cloud and is based on a multi-page pricing model that is difficult to navigate and even harder to understand.  The high and unexpected download bill can negatively affect budgets and IT credibility.

Putting data first

Arrows facing opposite ways
– Getty Images

The ultimate goal with any challenge is figuring out what you want to accomplish – developing a data storage strategy eliminates the “save it now, worry about it later” mentality. The strategy may include putting all of the data in a public cloud to reduce expensive data center infrastructure deployment costs, but doesn’t provide local access or IT control, and is expensive to remove.

If security and data protection are high priorities, putting all of the data on-premises provides local control and access, but is costly to implement and maintain. The optimal strategy for most is a hybrid approach that rethinks how data is captured, preserved, accessed and transformed.

Navigating through the volume, velocity and variety of data in search of critical information raises questions about where the data needs to be.

Today’s CIO and IT managers are rethinking the traditional data center infrastructure models and are putting more emphasis on the ability to balance variable workload and application needs with modern, resilient infrastructures, with economics that can scale petabytes with ease.

The data ‘first’ strategy is an opportunity to take a fresh look at the problems that need to be solved with key priorities that define the ‘right’ duration of data retention; where more or less data control is required; cost and performance requirements; scalability requirements and a plan to keep ‘data forever.’  

Data retention policies are difficult to determine as invaluable data captured today may become priceless five years from now once big data analysis is performed.  Data retention policies should also be considered as part of the data strategy, incorporating the needs of big and fast data applications.

Hybrid storage

An inexpensive place to store data isn’t the only reason to use the public cloud.  For example, leading cloud providers have analytical toolsets available to extract value and intelligence from captured data.  If higher-end analytics are required, compute power can be easily rented online from a cloud provider versus maintaining thousands of CPU cores on-premises for this occasional job.

A reasonable way to think about hybrid clouds is to avoid the data migration issue and keep data onsite, but use the compute and tools of the public cloud to run specific jobs. This enables organizations to address privacy regulations which may limit personal identifiable data from leaving a country.  As such, the public cloud has become a flexible capability to support your data strategy.

Many organizations are now moving their crowded NAS file systems to less expensive, more scalable object storage systems that migrate data to a private cloud.  As public cloud performance and accessibility are dependent on network connectivity, perhaps more so than a private cloud, a private cloud can provide an alternative security strategy and higher performance for improved data accessibility.  

The public cloud can act as an experimental sandbox environment where users can spin-up compute capabilities and workload toolsets, perform whatever analysis or processing is required, and release the results back to on-premises object storage.  This minimizes cost constraints to remove data from a public cloud, and from an IT perspective, they have visibility and control of the data, reducing unexpected surprises.

Final thoughts

The ebb and flow of data between public and private clouds is fundamentally sound and demonstrates a healthy hybrid cloud architecture as long as it is based on a strategy designed for what you want to accomplish.  Being more deliberate where data is placed relating to security, cost, performance, scalability, lifespan and stages of maturity should all be part of the data strategy, as well as a disaster recovery and offsite backup plan.

For most organizations, placing all data either on-premises or in a public cloud is probably not the right strategy, however, it’s perfectly acceptable to reverse hybrid cloud traffic especially if it fits your data strategy and addresses your business need.

Erik Ottem is director of product marketing for data center systems at Western Digital