Storage Area Networks (SAN) are specialized networks dedicated to serve high-performance data storage capacity with built-in security and block-level access. Essentially, SANs are data storage devices combined into RAID groups that reside on their own network, and provide storage access to connected clients. These RAID groups are then further subdivided into logical unit numbers (LUNs) which provide block-level access to clients as if it were a logical drive.
The simplicity of the client interface belies the complexity of SAN configurations, which can combine hundreds to thousands of storage devices across multiple physical locations, while allowing admins to divide storage and access permissions granularly. Furthermore, as an enabler of high availability (HA), SAN infrastructure supports mission-critical operations through virtualizations that allow access to the same data at the same time by multiple clients.
Data fabric is a term associated with SAN, and refers to the network the SAN uses, often composed of a combination of fiber-optic cables, Ethernet, and SCSI. These physical SAN fabrics can be consolidated into a single large fabric using virtual SAN (VSAN).
To connect to SANs, the most common method is the use of a host bus adapter (HBA), a PCI add-on card connected to a SAN switch, or less commonly, directly to the SAN. This method differs from Network Attached Storage (NAS), which uses TCP/IP networks for sending and receiving storage traffic. The main advantage of SANs are the efficiency gains from HBAs, and virtual HBAs, that allow clients to offload storage processing onto the SAN, and I/O functions onto the HBA, while reclaiming compute cycles for the client OS and application specific functions.
SAN solutions are essential for organizations in need of high-performance data storage capacity required to service large numbers of users simultaneously.
For the reason of high performance, SANs significantly differ from network attached storage (NAS). NAS is a single storage device, made available over TCP/IP networks as a shared data storage solution, meant to serve a limited number of users, on a home office or small business local network. SAN, by comparison, fills the business need to serve hundreds, even thousands of users the same data at the same time, for which the technical requirements are far more demanding. Direct fiber channel connections to the SAN fabric provides the fastest performance. When clients, SAN’s fibre fabric, and storage capacity are combined, SANs appear to clients as if they are directly connected.
A Storage Area Network (SAN) is designed to aggregate storage capacity, typically by collecting it into a single physical location, so that it can be centrally managed and then make it available to servers. Storage arrays are examples of physically centralizing storage, however, software-defined solutions that virtualize storage are able to aggregate storage from multiple locations.
This collected data storage is then put onto its own network, the storage area network, which can then manage its own storage traffic, and emancipate bandwidth from the main LAN. This has the effect of accelerating performance of enterprise workloads and applications.
SANs are generally composed of 3 component layers, the host layer, the fabric layer, and the storage layer.
- Host Layer — At the host layer are all the servers, databases, and other enterprise workloads. In fiber cable SANs, the host layer communicates with the network to the SAN through a host bus adapter (HBA), which can perform storage I/O commands and free up compute cycles.
- Fabric Layer — The data fabric comprises the network cabling, SAN switches, gateways, routers, etc. that connect the storage layer into a cohesive SAN.
- Storage Layer — All the storage devices, HHDs, SSDs, RAID, are compiled together at the storage layer. These devices are further organized into logical units LUNs so they can be organized, and secured. Access can be limited through LUN masking and zoning.
Tying these layers together is a series of protocols. The most common is Fibre Channel Protocol (FCP), which allows a SAN its full speed, as compared to iSCSI SAN which maps SCSI command to TCP/IP, but then limited to the speed of a TCP/IP based network.
SAN protocols vary based on their performance, reliability, complexity, and costs. The following are the most common Storage Area Network Protocols.
- Fibre Channel Protocol (FCP) — Fibre channel is the most adopted protocol, used in up to 80% of SAN deployments. While this protocol requires a professional to implement, and specialized equipment to run on, FC supports data rates up to 128 Gbps, and incorporates host bus adapters (HBA) to offload network processing from CPUs.
- Internet Small Computer System Interface (iSCSI) — iSCSI protocol is deployed to roughly 15% of SANs. Able to operate on 10-25GbE, it’s fast, but not as fast as Fibre. However, iSCSI is based on TCP/IP and Ethernet technologies, so its learning curve is typically more gentle, and component parts are cheaper.
- Fibre Channel over Ethernet (FCoE) — Similar to iSCSI in both market capture, small at 5%, and technological implementation, Fibre Channel over Ethernet (FCoE) uses standard Ethernet components, at the sacrifice of speed. FCoE performance is similar to iSCSI.
- Non-Volatile Memory Express over Fiber Channel (NVMe-oF) — NVMe-oF is an extension of NVMe that includes commands for Fibre Channel.
Storage Area Networks (SAN) ultimately aim to enhance how data storage protects, grows with, and allows access to an organization's valuable data assets. Implementing a SAN provides several advantages over other data storage options.
- Storage Scaling — SAN technology is purpose built for scaling. When data storage demands new capacity, simply add new storage, or divide current storage through virtualization and logical division and better utilize resources.
- Improved Application Availability and Performance — SAN fabrics function to allow multiple users access to the same data without lag time. This unprecedented network storage speed improves application availability and performance.
- Consolidated Data Fabric — SAN technology can consolidate multiple storage devices from multiple locations, allowing them to be centrally managed, improving performance and resource utilization.
- Centralized Management — With a consolidated data fabric, centralized management allows admins to monitor storage media, back ups, and the over all data fabric.
- Data Protection and Security — SANs are designed to support business continuity and disaster recovery by protecting data from corruption, and securing it against intruders.
SAN switches connect computers directly to the SAN data fabric, and makes it possible to exchange data at high-speeds. Because the data rates are so high, newer SAN switches are built with path redundancy, network diagnostics, and bandwidth auto-sensing to ensure low network congestion.
SAN switches come either built for Fiber Channel (FC) or Ethernet. FC switches provide the fastest connection to a SAN, with possible features such as encryption, zoning and load balancing, and data access controls. But as Ethernet technology has advanced to 10GbE, many are using Ethernet with iSCSI protocol as a cheaper, albeit slower, alternative.
Storage Area Networks (SAN) and Network Attached Storage (NAS) are similar in concept, but wholly different in practice. The following chart outlines the main differences between the two approaches to network storage.
SAN | Characteristics | NAS |
Enterprises | Main User | Homes and small businesses |
Serving data to multitude of users Data archiving | Use Case | Back up office files |
Significant investment | Cost | Consumer expense |
May require an IT admin | Ease | Typically out of the box |
Servers accessed as if local | Data Access | Data accessed as if local |
Fibre Channel speed range 16-32Gb/s, with FCoE and iSCI protocols upto 10-40Gb/s+ | Data Transfer Speed | Dependent on local TCP/IP, network congestion, typically operating on 1GbE-10GbE |
Fibre Channel, iSCSI, FCoE. | Protocols | SMB/CIFS, NFS, SFTP, and WebDAV |
Built for rapid scaling | Scalability | High-end solutions allow scaling through clusters and scale-out nodes, but low-end solutions do not scale |
Dedicated SAN fabric | Connection | Direct Ethernet connection |
Built with fault tolerance and redundancy features | Fault Tolerance | NAS typically represent a single point of failure |
DataOps, an umbrella term, refers to an organizations' collections of technical practices, workflows, cultural norms, and architectural patterns surrounding how they integrate their data assets into their business goals and processes. This means that each organization's data pipelines are likely to be configured differently, however, in general, DataOps efforts intend to enable four capabilities within the company:
- Rapid innovation, experimentation, and testing, to deliver data insights to users and customers.
- Heighten data quality with extremely low error rates.
- Synchronized collaboration across teams of people, environments, and technologies.
- Orchestrated monitoring of data pipelines to ensure clear measurements, and transparent results.