Tuesday, December 26, 2006

The Hadoop Distributed File System: Architecture and Design

The Hadoop Distributed File System: Architecture and Design: "# Introduction
# Assumptions and Goals

* Hardware Failure
* Streaming Data Access
* Large Data Sets
* Simple Coherency Model
* Moving computation is cheaper than moving data
* Portability across Heterogeneous Hardware and Software Platforms

# Namenode and Datanode
# The File System Namespace
# Data Replication

* Replica Placement . The First Baby Steps
* Replica Selection
* SafeMode

# The Persistence of File System Metadata
# The Communication Protocol
# Robustness

* Data Disk Failure, Heartbeats and Re-Replication
* Cluster Rebalancing
* Data Correctness
* Metadata Disk Failure
* Snapshots

# Data Organization

* Data Blocks
* Staging
* Pipelining

# Accessibility

* DFSShell
* DFSAdmin
* Browser Interface

# Space Reclamation

* File Deletes and Undelete
* Decrease Replication Factor

# References"

No comments: