How Quobyte Uses Software to Add Lanes to the Data Highway

Companies are constantly in need of storing their data, which depending on their size can be in the petabytes. Storing this much data has traditionally been relegated to data centers (which in some cases have even been moved to the bottom of the ocean). However, the limitations of current storage options include overdependency on hardware, as well as shortcomings in scalability, maintenance, distribution and security. To address these limitations, a company called Quobyte is offering a software-based storage infrastructure that’s hardware-agnostic, scalable and user-friendly.

The combined data of just the “Big Four”—Apple, Google, Microsoft and Facebook—is estimated to be 1200 petabytes. A single petabyte is equal to 1024 terabytes, which is roughly equal to a million gigabytes. For comparison, Call of Duty: Modern Warfare is widely considered the largest video game ever made with a file size of “only” 200 gigabytes.

Three Basic Services

Quobyte was founded in 2013 by Björn Kolbeck and Felix Hupfeld, who serve as CEO and CTO respectively. Both men previously worked at Google, where they were impressed by how a small team of administrative staff was able to maintain the tech giant’s exorbitant data. This inspired them to branch out and create a software architecture that, at its core, provides three services: registry, metadata and data service.

Quobyte’s registry service runs on a minimum of four servers. It enables the client—the computer hardware or software that is trying to access information on a server—to navigate to the right server in the storage network.

Service registry directs the client to the right services. (Source: Think Microservices.)

Metadata service stores all information pertaining to the file—but not the actual file data itself. This information includes the file’s location, permissions related to accessing the file, Access Control Lists (ACLs), and all other attributes. Like the registry service, Quobyte’s metadata service runs on a minimum of four servers and can linearly scale as required.

Lastly, data service is where the input/output (IO) occurs. As Quobyte explained in a whitepaper: “In practical terms, the number of data service nodes can be in the hundreds or more. Lightly loaded clusters can run all three Quobyte storage services on the same nodes if desired. Customers who require maximum performance will likely run dedicated nodes for metadata and data services.”

Hardware Agnosticism

Quobyte provides storage solutions on bare metal x86 servers as well as public cloud virtual machines (VMs), allowing the customer to build a customized storage solution that meets their specific cost and performance needs. Though Quobyte itself does not sell any hardware, its partners offer a variety of hardware storage options. Customers can choose between solid-state drives (SSD), hard drives (HDD), or a combination of the two. Most Quobyte customers use a combination, since SSDs have a wider range of applications but HDDs are cheaper and grant more space. The customer’s admin staff can customize which hardware storage type to use for each specific data.

“As an example, you have metadata files that are just a few bytes, and then you have larger image files, video files, or autonomous driving files that are several megabytes,” Kolbeck elaborated in an interview with engineering.com. “The IT admin would decide to put the large files on hard drive and small files on flash, and that might give you the best performance and cost. Or you can implement policies where, for example, everything you read from the senders goes onto flash first, and then it’s tiered down to hard drive as time passes. So, you have a lot of flexibility there.”

An example of Quobyte’s recommended combination HDD/SDD configuration. (Source: Quobyte.)

Quobyte also presents a variety of VM recommendations depending on clients’ performance requirements and budget. Customers are encouraged to discuss their storage options with Quobyte’s engineers and their clients for optimal configurations.

A Unified Infrastructure

Quobyte features a unified storage software that supports a wide array of file access methods including Hadoop, S3, POSIX, and more. As a distributed parallel file system, it grants clients direct, simultaneous access to a file. In practice, this means that a Windows user could be editing a Microsoft Word file at the same time that a Mac user is reading it, but without needing to copy the file or move it to a different system.

As explained in Quobyte’s whitepaper: “Quobyte software allows IT departments to provide a single consolidated storage platform, thus removing data silos imposed due to differing storage interface protocols. Virtually any environment where data needs to be accessible to and transferred between Linux, Windows, or Mac systems, via NFS, SMB or S3, can benefit from Quobyte storage.”

Quobyte’s parallel file system connects multiple servers and clients. As a result, different clients can gain simultaneous access to the same file. (Source: Quobyte.)

Linear Scalability

Quobyte’s infrastructure can accommodate an unlimited number of servers to be added to a storage cluster. This can range from a minimum of four servers to thousands of servers. The more servers are added, the better the hardware performance will be. Peak hardware performance can reach as high as 100 GiB/s per server.

“We can aggregate the performance of all the servers that you put in,” said Kolbeck. “So, if you do the minimum of four servers, you suddenly have the aggregated performance of four servers available—and then your users will see a significant difference.”

Kolbeck likens it to adding more lanes to a highway; the more lanes you have in a highway, the easier the traffic flow and the more unlikely it is to encounter bottlenecks or obstructions. As for how this kind of scalability is made possible, he attributes it to Quobyte’s decentralized storage architecture.

“Instead of having one component that knows everything, our architecture is built in a way that the server has the data communicate with each other and they don’t need to coordinate for the central entity,” he explained. “Sometimes they compare it to federal states. You have the federal state, then you have the state and then you have the local government. By making decisions locally, it becomes very scalable. Unlike a system where everything has been decided at the top level, our architecture is somewhat similar. We enable on the local level—the servers that have your data to coordinate among themselves without having to ask a central entity. And that way our customers can scale the system to hundreds of storage servers.”

The decentralization has the added benefit of each server operating independently while still sharing information with other servers. This makes the whole system more stable so that a client can access data even if one server is down or an HDD/SDD storage unit is damaged.

K.I.S.S: Keep It Simple, Stupid

The storage industry is mostly an appliance-based world. Customers must purchase specialized, often expensive, hardware that needs to be integrated into the data center. Installation is also a complex, timely process. Installing Quobyte, on the other hand, does not require any specialized software outside of a basic understanding of Linux-based applications.

“Customers can download and install [Quobyte] like any other software,” stated Kolbeck. “If you know how to run Linux on a few servers, you can download our software and build your own storage system. Once that’s done, you have a fully fault-tolerant reliable storage system that, in the free version of the software, can serve up to 150 terabytes. I would say that takes about 10 minutes.”

In terms of hardware, the minimum requirements include at least four servers or cloud VMs, 32GB of RAM, the equivalent of an eight-core Intel E5-series CPU, and a 10 Gbps Ethernet-based IP network.

Kolbeck maintained that adding servers in Quobyte is just as straightforward as the installation process—a bold claim. All that users have to do is install the server, install Quobyte, and it is integrated into the server in a matter of minutes. Resources are available at installation and the boost in bandwidth and performance is noticeable from the outset, even for the end users. Removing servers is also fairly straightforward.

“It’s one click in our user interface,” Kolbeck said. “You can move the data to other servers and once it’s empty, you can just unplug the server, send it back to the manufacturer, for example, or use it for something else.”

Additionally, through built-in analytics and a real-time online dashboard, Quobyte offers a streamlined maintenance option for its storage space components. For example, it automatically detects and removes drives that are not working, while performing de-cluster rebuilds—all without requiring human operatives. Its dashboard feature enables users to monitor the hardware as well as review the overall productivity of the storage team. Features like these allow users to add as many servers to their network as they require, without needing to hire more administrative staff for management and maintenance.

Quobyte’s detailed online dashboard automates numerous maintenance tasks. (Source: Quobyte.)

Quobyte offers three kinds of subscriptions. The Free subscription gives users 150 TB of HDD storage plus either 30 TB of SDD storage or 10 TB of cloud storage. The Cluster subscription builds upon the Free subscription by adding email support options and X.509 certificates. Lastly, the Infrastructure subscription features unlimited storage on HDD, SDD and cloud VMs. Other features include 24/7 customer support, end-to-end encryption, and access keys.

For more information, visit the Quobyte website.