What is a Cluster Filesystem?
This is a short taxonomy of the kinds of distributed filesystems you can find today (Febrary 2004). This was assembled with some help from Garth Gibson and Larry Jones.
Distributed filesystem - the generic term for a client/server or "network" filesystem where the data isn't locally attached to a host. There are lots of different kinds of distributed filesystems, the first ones coming out of research in the 1980s. NFS and CIFS are the most common distributed filesystems today
Global filesystem - this refers to the namespace, so that all files have the same name and path name when viewed from all hosts. This obviously makes it easy to share data across machines and users in different parts of the organization. For example, the WWW is a global namespace because a URL works everywhere. But, filesystems don't always have that property because your share definitions may not match mine, we may not see the same file servers or the same portions of those file servers.
AFS was an early provider of a global namespace - all files were organized under /afs/cellname/... and you could assemble AFS cells even from different organizations (e.g., different universities) into one shared filesystem. The Panasas filesystem (PanFS) supports a similar structure, if desired.
SAN filesystem - these provide a way for hosts to share Fibre Channel storage, which is traditionally carved into private chunks bound to different hosts. To provide sharing, a block-level metadata manager controls access to different SAN devices. A SAN Filesystem mounts storage natively in only one node, but connects all nodes to that storage and distributes block addresses to other nodes. Scalability is often an issue because blocks are a low-level way to share data placing a big burden on the metadata managers and requiring large network transactions in order to access data.
Examples include SGI cXFS, IBM GPFS, Red Hat Sistina, IBM SanFS, EMC Highroad and others.
Symmetric filesystems - A symmetric filesystem is one in which the clients also run the metadata manager code; that is, all nodes understand the disk structures. A concern with these systems is the burden that metadata management places on the client node, serving both itself and other nodes, which may impact the ability of the client to perform its intended compute jobs. Examples include Sistina GFS, GPFS, Compaq CFS, Veritas CFS, Polyserve Matrix
Asymmetric filesystems - An asymmetric filesystem is one in which there are one or more dedicated metadata managers that maintain the filesystem and its associated disk structures. Examples include Panasas ActiveScale, IBM SanFS, and Lustre. Traditional client/server filesystems like NFS and CIFS are also asymmetric.
Cluster filesystem - a distributed filesystem that is not a single server with a set of clients, but instead a cluster of servers that all work together to provide high performance service to their clients. To the clients the cluster is transparent - it is just "the filesystem", but the filesystem software deals with distributing requests to elements of the storage cluster.
Examples include: HP (DEC) Tru64 cluster and Spinnaker is a clustered NAS (NFS) service. Panasas ActiveScale is a cluster filesystem
Parallel filesystem - file systems with support for parallel applications, all nodes may be accessing the same files at the same time, concurrent read and write. Examples of this include: Panasas ActiveScale, Lustre, GPFS and Sistina.
Finally, these definitions overlap. A SAN filesystem can be symmetric or asymmetric. Its servers can be clustered or single. And it can support parallel apps or not.
The Panasas Storage Cluster and its ActiveScale File System is a clustered (many servers share the work), asymmetric (metadata management does not happen on the clients), parallel (supports concurrent read and write well), object-based (not block-based) distributed (storage is across the network from clients) file system.
IBM SanFS is SANergy or not ?
原帖由 saharan 於 2006-10-31 14:29 發表
IBM SanFS is SANergy or not ?
原帖由 nntp 於 2006-10-30 18:59 發表
This is a short taxonomy of the kinds of distributed filesystems you can find today (Febr ...
請教一下，關於`a cluster of servers that all work together to provide high performance service to their clients. To the clients the cluster is transparent - it is just "the filesystem", but the filesystem software deals with distributing requests to elements of the storage cluster.'
現有成熟的產品中，有哪些能做到這上邊這段說所說的構架? Lustre, RH-GFS?
原帖由 baif 於 2006-11-2 10:41 發表
請教一下，關於`a cluster of servers that all work together to provide high performance service to their clients. To the clients the cluster is transparent - it is just "the filesystem&qu ...
hello , 這不是一個類，是一種歸類的定義.
[ 本帖最後由 nntp 於 2006-11-5 12:45 編輯 ]
原帖由 blue_stone 於 2006-11-5 08:43 發表
數據存放的位置是由多台伺服器協同完成的訪問服務的，我認為這種定義的顆粒比較粗，都可以叫做集群化的文件系統. 在這樣一個粗的框架下面, 可以更細緻的定義出并行文件系統，集群文件系統等等.
并行文件系統意味著一個統一的文件系統以及他的數據被分散在多個服務提供者和存儲者的載體上，可以看作文件系統的"RAID". lustre, PVFS/2 都屬於這樣的類型.
集群文件系統意味著服務提供者本身是一個標準意義上的集群環境，具有普通集群的特徵，比如成員關係，集群成員範圍內的鎖, 心跳等等. 而文件存儲者是單一的,需要服務者通過相互的集群關係來控制對單一存儲位置數據的訪問. 所以我的歸類中的集群文件系統和并行文件系統是完全不同的兩個東西. 他們的服務提供者之間的關係以及文件存儲者以及數據位置，都不一樣.
考察并行中比較典型的PVFS2/lustre 和 集群文件系統Sistina(redhat)的GFS，應該就可以建立起自己的分類概念.