Storage

To store data there is hardware (magnetic disks). The arrangement on disk with which this data is stores is called the filesystem. The specific format (whether metadata or information about is stored) varies from filesystem to filesystem. What is actually stored can be an object, a file, or a block.


Block Storage

Block storage splits data into files across different locations. There is no metadata stored with it (ownership, description, association etc.) Strong consistency is maintained on change of any data.

In detail:

Object Storage abstracts the entire management layer, so the same command using the connector simply returns “Filesystem is Object Storage.” (By the same token, “no storage management overhead” can be a pro; see below. No storage management overhead for object storage. Unlike HDFS, Cloud Storage requires no routine maintenance such as running checksums on the files, upgrading or rolling back to a previous version of the file system and other administrative tasks. 

If you were to run a command such as hadoop fsck -files -blocks against a directory in HDFS, you would see an output of useful information, ranging from status to racks to corrupted blocks.

Downsides:






File Storage

Data is stored as a single piece of information inside a folder, just like you’d organize pieces of paper inside a manila folder. When you need to access that piece of data, your computer needs to know the path to find it. (Beware—It can be a long, winding path.) Data stored in files is organized and retrieved using a limited amount of metadata that tells the computer exactly where the file itself is kept. It’s like a library card catalog for data files.

Think of a closet full of file cabinets. Every document is arranged in some type of logical hierarchy—by cabinet, by drawer, by folder, then by piece of paper. This is where the term hierarchical storage comes from, and this is file storage. It is the oldest and most widely used data storage system for direct and network-attached storage systems, and it’s one that you’ve probably been using for decades. Any time you access documents saved in files on your personal computer, you use file storage.

The problem is, just like with your filing cabinet, that virtual drawer can only open so far. File-based storage systems must scale out by adding more systems, rather than scale up by adding more capacity.

Network Attached Storage: exposes its storage as a network file system


Object Storage

Object storage is a computer data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. One of the design principles of object storage is to abstract some of the lower layers of storage away from the administrators and applications. Thus, data is exposed and managed as objects instead of files or blocks. Objects contain additional descriptive properties which can be used for better indexing or management. Administrators do not have to perform lower-level storage functions like constructing and managing logical volumes to utilize disk capacity or setting RAID levels to deal with disk failure. Object storage also allows the addressing and identification of individual objects by more than just file name and file path. Object storage adds a unique identifier within a bucket, or across the entire system, to support much larger namespaces and eliminate name collisions.

Includes Metadata stored with the object. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level (object-storage device), the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that can be directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.

Alow storage of massive amounts of unstructured data

One of the first object-storage products, Lustre, is used in 70% of the Top 100 supercomputers and ~50% of the Top 500.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. However, unlike block-based distributed filesystems, such as GPFS and PanFS, where the metadata server controls all of the block allocation, the Lustre metadata server is only involved in pathname and permission checks, and is not involved in any file I/O operations, avoiding I/O scalability bottlenecks on the metadata server. 

Downsides


Virtual Object Storage

In addition to object-storage systems that own the managed files, some systems provide an object abstraction on top of one or more traditional filesystem based solutions. These solutions do not own the underlying raw storage, but instead actively mirror the filesystem changes and replicate them in their own object catalogue, alongside any metadata that can be automatically extracted from the files. Users can then contribute additional metadata through the virtual object storage APIs. A global namespace and replication capabilities both inside and across filesystems are typically supported. Notable examples in this category are Nirvana, and its open-source cousin iRODS.


Key Value Stores

Unfortunately, the border between an object store and a key-value store is blurred, with key-value stores being sometimes loosely referred to as object stores.

A traditional block storage interface uses a series of fixed size blocks which are numbered starting at 0. Data must be that exact fixed size and can be stored in a particular block which is identified by its logical block number (LBN). Later, one can retrieve that block of data by specifying its unique LBN.

With a key-value store, data is identified by a key rather than a LBN. A key might be “cat” or “olive” or “42”. It can be an arbitrary sequence of bytes of arbitrary length. Data (called a value in this parlance) does not need to be a fixed size and also can be an arbitrary sequence of bytes of arbitrary length. One stores data by presenting the key and data (value) to the data store and can later retrieve the data by presenting the key.

This concept is seen in programming languages. Python calls them dictionaries, Perl calls them hashes, Java and C++ call them maps, etc. Several data stores also implement key-value stores such as Memcached, Redis and CouchDB.

Object stores are similar to key-value stores in two respects.

There are, however, a few key differences between key-value stores and object stores.


15 January 2019