Social media giant Meta’s recent paper highlights four primary factors influencing HDD reliability: age, workload, temperature, and interference from vibrations. We covered the impact of age in the previous part. Here, we continue our analysis of Meta’s study and the factors related to HDD failures highlighted by them.

Workload

HDDs experience varying workloads throughout their operational lifespan, influenced by factors such as data provisioning, encoding schema changes, and application demands. Meta compared the median workload between healthy and unhealthy HDDs and found that the workload was significantly greater (1.5x greater) for unhealthy HDDs than for healthy HDDs. This demonstrated a relationship between workload and failures.

Temperature

Relying on evidence from past research, Meta pointed out that elevated temperatures significantly elevated HDD Annualized Failure Rates (AFR), underscoring the importance of thermal management strategies. In their own IT infrastructure, Meta implements temperature monitoring mechanisms, coupled with dynamic fan control systems, to maintain optimal operating conditions and mitigate thermal-induced failures.

Additionally, Meta also highlighted the close relationship of both workload and temperature with drive failures by noting that these two factors were among the most important input variables in a machine learning model trained to detect drive failures.

Rotational and Acoustic Vibrations (RV/AV)

Also relying on evidence from past research, Meta noted that mechanical vibrations, generated by rotational and acoustic sources, had the potential to influence HDD reliability.

Meta’s study highlighted some interesting factors that may aid in predicting drive failures, particularly age and workload, which may not necessarily be the most intuitive factors one thinks of when it comes to drive failures – such as read/write errors or command timeouts. This information is interesting and may potentially be useful in modeling drive failures in datacenter-grade HDDs, which the study focused on. What makes Meta’s findings especially worth paying attention to is the fact that they have their own large body of drives to analyze.

ULINK has already productized AI prediction with its QNAP NAS product, which was released in December 2021. ULINK is open to work further with general public (B2C) and server companies (B2B) on AI based drive failure prediction to keep ULINK DA Drive Analyzer on the cutting edge.

 

QNAP Launches the AI-Powered DA Drive Analyzer 2.0 – Predicts NAS Drive Failure Within 24 Hours & Enhances Enterprise Privacy

Photo Credit: Oleksandr Bushko

 

Latest Versions

DriveMaster Release

DriveMaster 9: v9.2.1800

Test Suite Release

ULINK NVMe Protocol: v6.0 (New)
ULINK NVMe Regression: v5.0 (New)

TCG Opal Family Certification: v6.0 (New)
TCG Opal Family SSC Multiple Namespaces Protocol Test Suite: v2.0
TCG Opal Family SSC Application Note: v5.5 (New)
TCG Enterprise Application Note: v5.0
ULINK TCG/I1667 Opal Family Protocol: v10.0 (New)
ULINK TCG Enterprise Protocol: v5.0

ULINK SATA/ATA Protocol: v9.0
ULINK SATA/ATA Regression: v9.0 (New)
SATA-IO Device Digital: v4.0
SATA-IO Host Digital: v3.0

ULINK SAS/SCSI Protocol: v4.5
ULINK SAS/SCSI Regression: v5.0 (New)

Test Reporter

v4.4.1 (New)