Why PowerFlex for databases?

Why PowerFlex for VM’s? Why PowerFlex for containers? Why PowerFlex for anything? —- should really be the heading of this particular post.

But yes, why? Obviously there are many factors when it comes to choosing an Enterprise grade storage system; Performance, Reliability, Price, Features, etc, but today I want to focus on one thing and one thing alone:

Performance

This is what separates the wheat from the chaff, and it’s where PowerFlex shines. But first of all, why is performance important for databases in the first place?

Performance for databases

Databases, be it Oracle, MS SQL, MySQL, MariaDB, Postgres, Greenplum, MongoDB, and so on – all share one common characteristic: They are important to your business. In many cases they are mission critical to your business and without them, your whole operation would come screaming to a halt. So reliability is critical, and that’s another aspect where PowerFlex’s performance has a direct correlation with its reliability (we’ll touch on that later, even though I promised this post was only about Performance).

So why is Performance important for databases?

The number 1 reason is simply end-user satisfaction. More performance means more orders, more orders means more $$$’s. More performance means higher customer satisfaction in not having to wait around all day for a report or to make an order on your website. More reporting means better business insights. Better business insights = a better business. You get the drift.
In the olden days (my young son calls the current age “newden days” – I can’t argue with him!), it was mandatory that databases be run on a RAID1 mirrored kind of setup – heck I didn’t know why myself at the time, but you soon understood it very clearly when you put it on a RAID5 group and someone started complaining about performance! The good news is that PowerFlex is a scale-out mesh-mirror architecture. This thing was built to run databases.
Licensing costs – this is a very big one. You may have noticed that some DB’s are a little bit pricey these days. They are often licensed per core, and it’s very much in the interest of those vendors to increase your core count, not reduce them! (One vendor in particular likes to sell you an engineered database system with a lot of very slow clock speed cores 🤔)

So how does PowerFlex help to address these performance concerns when it comes to databases?

With one simple method that I believe was handed on slate to the founders upon the top of Mount Sinai – “Thou shalt mesh-mirror!”
I believe (okay I made all of this up), there were a few other commandments as well:
- You shall not have any single point of metadata lookups
- Remember to keep the I/O path direct, to keep it fast.
- Honor your databases, applications and end-users.
- Thou shall not steal capacity by wasting it on caching devices.
- You shall bend, but not break. (Thanks Brian! https://www.youtube.com/watch?v=N2QOAnfi7x4)
That’s enough obscurity for now, so in real terms:

Mesh-mirroring – the secret sauce, the big kahuna, the bees knees, the foundation upon which everything lies. This is it folks, this is what makes PowerFlex unique. Imagine this – you have 30 storage nodes, each with 10x NVMe/SSD’s – a total of 300 very fast devices. You then create a single Volume for your database (or a few Volumes, it’s up to you). Those Volumes, are then broken down into small chunks of 1MB each (on what we call the medium grain layout), and are then 100% evenly distributed across all of those 300 devices! So your database, now has the power of 30 controllers, and 300 extremely fast devices all working in perfect parallelism. Imagine that! Actually don’t imagine it, just do it. It’s awesome.

This is from a 6 node 15G cluster, and seeing over 1 million IOPS @ 240usec latency for 100% random 70/30 8K I/O across any Volume size, and doing that 24/7/365. It can go faster than this, we were just trying to keep latency as low as possible for this test.

The more nodes you add, the faster it gets.

We don’t obey the laws of thermodynamics

This is in direct contrast to your typical mid-range dual-controller storage systems out in the market today. They have their place of course, but the power of software defined storage… uh oh, I feel another meme coming on…

Speaking of direct – Direct I/O from the clients to the storage – is another unique PowerFlex characteristic that allows it to scale-out to massive environments without metadata lookups becoming the bottleneck (which is often the case in traditional storage systems).

Every PowerFlex client (SDC) has an metadata map in its own memory thanks to a proprietary algorithm that ‘knows’ exactly where all of its data lies across the entire cluster. This only takes up about 50MB of RAM per client and is incredibly efficient. In a database context, where random I/O is extremely important – the time saving on these metadata lookups pays for itself in terms of end-to-end lower latency, and much higher CPU efficiency.

Low Latency – the king 👑 of all storage characteristics. This is what databases crave. I will spare you the meme this time. With a low-latency Ethernet fabric, decent storage media, and an efficient storage layout (mesh-mirroring) – you can now achieve sub >200usec latencies with the latest generation PowerFlex systems. This isn’t any fake random read hit cache nonsense either – this is 100% random I/O – 24/7/365, across any Volume size. It was only a few years ago that >1ms was heralded as game changing — we are now 5x faster than that!

But why does latency matter so much for databases? Why does it matter for storage at all? Think of it this way – every I/O must be completed in a certain time frame, be it 1 second, 10ms, 1ms or 100usec.

And there’s your problem – especially for single-threaded, synchronous I/O’s – there is no substitute for low latency. If you have high latency, you’ll have high CPU ‘wait’ states – and if you have high CPU wait states, you’re going to need more CPU cores to perform the same amount of work. More cores = more $$$’s to your favourite DB vendor.

From an environmental perspective too, it is our moral obligation to find the most power efficient solutions to meet our requirements. Less CPU cores = less power and cooling = less real estate = lower power consumption = a lower carbon footprint.

So there you have it. We haven’t quite solved world hunger yet, but it is on the roadmap. Do yourself and your business a favour though by giving PowerFlex a shot – seeing is believing and we’re always happy to get involved in any performance POC you might need us to do. That reminds me, PowerFlex is now available on AWS too! For all your shared-block storage needs with greater efficiency across multiple AZ’s.

https://aws.amazon.com/marketplace/seller-profile?id=e8364047-1b0d-42cb-abcf-344995d78ab3

As I always like to say about PowerFlex – use your imagination – it’s most likely possible.

Matt Hobbs

Personal Sustainability Engineer, ex-APJ PowerFlex Presales for Dell Technologies – High performance software defined storage. “Enterprise grade storage that can also do HCI”. Australian, married, 2 boys, based out of Singapore since 2008 but now back in the land of Oz since 2022. We now have 5 chooks! (Chickens for the non-Aussies out there).

Tags: AWS Performance PowerFlex Software Defined

Why PowerFlex for databases?

Like this:

Related

You may also like...

Leave a Reply Cancel reply

Share this:

Like this:

Related

You may also like...

SUSECON24 – Return of the Flex!

You mean there’s a better way?

A Roundup of the Latest PowerFlex Plugins

Leave a Reply Cancel reply