To store data there is hardware (magnetic disks). The arrangement on disk with which this data is stores is called the filesystem. The specific format (whether metadata or information about is stored) varies from filesystem to filesystem. What is actually stored can be an object, a file, or a block.
Block storage splits data into files across different locations. There is no metadata stored with it (ownership, description, association etc.) Strong consistency is maintained on change of any data.
In detail:
Files are split into evenly sized blocks of data, each with its own address (Logical Block Number). Unlike file storage – where the data is managed on the file level – data is stored in data blocks. Several blocks (for example in a SAN system) build a file. A block consists of an address and the SAN application gets the block if it makes an SCSI-Request to this address. The storage application decides then where the data blocks are stored inside the system and on what specific disk or storage medium. How the blocks are combined and how they can be accesses are also decided by the storage application.
Object Storage abstracts the entire management layer, so the same command using the connector simply returns “Filesystem is Object Storage.” (By the same token, “no storage management overhead” can be a pro; see below. No storage management overhead for object storage. Unlike HDFS, Cloud Storage requires no routine maintenance such as running checksums on the files, upgrading or rolling back to a previous version of the file system and other administrative tasks.
If you were to run a command such as hadoop fsck -files -blocks
against a directory in HDFS, you would see an output of useful information, ranging from status to racks to corrupted blocks.
Downsides:
Data is stored as a single piece of information inside a folder, just like you’d organize pieces of paper inside a manila folder. When you need to access that piece of data, your computer needs to know the path to find it. (Beware—It can be a long, winding path.) Data stored in files is organized and retrieved using a limited amount of metadata that tells the computer exactly where the file itself is kept. It’s like a library card catalog for data files.
Think of a closet full of file cabinets. Every document is arranged in some type of logical hierarchy—by cabinet, by drawer, by folder, then by piece of paper. This is where the term hierarchical storage comes from, and this is file storage. It is the oldest and most widely used data storage system for direct and network-attached storage systems, and it’s one that you’ve probably been using for decades. Any time you access documents saved in files on your personal computer, you use file storage.
The problem is, just like with your filing cabinet, that virtual drawer can only open so far. File-based storage systems must scale out by adding more systems, rather than scale up by adding more capacity.
Network Attached Storage: exposes its storage as a network file system
When devices are attached to a NAS system, a mountable file system is displayed and users can access their files with proper access rights. Because of that, a NAS system has to manage user privileges, file locking, and other security measures so several users can access files.
As with any server or storage solution, a file system is responsible for positioning the files in the NAS. This works very well for hundreds of thousands or even millions of files, but not for billions. The access to the NAS is handled via NFS and SMB/CIFS protocols.
A SAN stores data at the block level, while NAS accesses data as files. To a client OS, a SAN typically appears as a disk and exists as its own separate network of storage devices, while NAS appears as a file server.
Object storage is a computer data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. One of the design principles of object storage is to abstract some of the lower layers of storage away from the administrators and applications. Thus, data is exposed and managed as objects instead of files or blocks. Objects contain additional descriptive properties which can be used for better indexing or management. Administrators do not have to perform lower-level storage functions like constructing and managing logical volumes to utilize disk capacity or setting RAID levels to deal with disk failure. Object storage also allows the addressing and identification of individual objects by more than just file name and file path. Object storage adds a unique identifier within a bucket, or across the entire system, to support much larger namespaces and eliminate name collisions.
Includes Metadata stored with the object. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level (object-storage device), the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that can be directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.
Alow storage of massive amounts of unstructured data
One of the first object-storage products, Lustre, is used in 70% of the Top 100 supercomputers and ~50% of the Top 500.
Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. However, unlike block-based distributed filesystems, such as GPFS
and PanFS
, where the metadata server controls all of the block allocation, the Lustre metadata server is only involved in pathname and permission checks, and is not involved in any file I/O operations, avoiding I/O scalability bottlenecks on the metadata server.
Object storage doesn’t split files up into raw blocks of data. Instead, entire clumps of data are stored in an object that contains the data, metadata, and the unique identifier – applications identify the object via this ID. Each object is stored in a flat address space, making them much easier to locate and retrieve the data. The many objects inside an object storage system are stored all over the given storage disks.
All objects are managed via the application itself. This means that No real file system is needed, as the layer is obsolete. When an application sends a storage inquiry to the solution regarding where to store the object, the object is given an address inside the huge storage space and saved there by the application itself.
Uses
Downsides
Doesn’t provide you with the ability to incrementally edit one part of a file (as block storage does). Objects have to be manipulated as a whole unit, requiring the entire object to be accessed, updated, then re-written in their entirety. That can have performance implications. In its pure form object storage can only save one version of a file (object). If a user makes a change another version of the same file is stored as a new object. Due to this, an object storage is a perfect for a backup or archive solution, for example, online video streaming sites.
Important to recognize that object storage was not created as a replacement for NAS file access and sharing; it does not support the locking and sharing mechanisms needed to maintain a single accurately updated version of a file
In addition to object-storage systems that own the managed files, some systems provide an object abstraction on top of one or more traditional filesystem based solutions. These solutions do not own the underlying raw storage, but instead actively mirror the filesystem changes and replicate them in their own object catalogue, alongside any metadata that can be automatically extracted from the files. Users can then contribute additional metadata through the virtual object storage APIs. A global namespace and replication capabilities both inside and across filesystems are typically supported. Notable examples in this category are Nirvana, and its open-source cousin iRODS.
Unfortunately, the border between an object store and a key-value store is blurred, with key-value stores being sometimes loosely referred to as object stores.
A traditional block storage interface uses a series of fixed size blocks which are numbered starting at 0. Data must be that exact fixed size and can be stored in a particular block which is identified by its logical block number (LBN). Later, one can retrieve that block of data by specifying its unique LBN.
With a key-value store, data is identified by a key rather than a LBN. A key might be “cat” or “olive” or “42”. It can be an arbitrary sequence of bytes of arbitrary length. Data (called a value in this parlance) does not need to be a fixed size and also can be an arbitrary sequence of bytes of arbitrary length. One stores data by presenting the key and data (value) to the data store and can later retrieve the data by presenting the key.
This concept is seen in programming languages. Python calls them dictionaries, Perl calls them hashes, Java and C++ call them maps, etc. Several data stores also implement key-value stores such as Memcached, Redis and CouchDB.
Object stores are similar to key-value stores in two respects.
There are, however, a few key differences between key-value stores and object stores.