Backblaze, which provides cloud storage and data backup services, has published another report on the statistics of failures of hard drives of various models. Following the global trend, the company decided to find out whether artificial intelligence can help reduce the number of failures.

Image source: MH Rhee / pixabay.com

At the end of the second calendar quarter of 2024, Backblaze had 284,876 hard drives in operation. The company excluded from the sample models that are operated in quantities of up to 100 units, and those that did not accumulate a total of 10,000 days of operation during the quarter. The report included 284,386 units from 29 models. Given how popular AI technologies are across industries today, Backblaze wondered if they could be used to predict hard drive failures. To do this, you will need to train a large language model on company statistics and test the hypothesis whether the AI ​​is able to calculate the probability of failure of a certain drive over time – and it is not yet clear whether the statistics for one model can be applied to another, because their failure profiles can differ radically.

Here and below, source of images: backblaze.com

The latest report found that the average annual failure rate (AFR) for the second quarter was 1.71%, which is lower than the 2.28% recorded in the same period last year, but higher than the 1.41% in the first quarter of 2024. Of greatest concern was the 12TB HGST model (HUH721212ALN604), whose AFR jumped to 7.17% during the reporting period, pushing its lifetime rate from 0.99% to 1.57%. It is also noteworthy that two models – Seagate 14 TB ST14000NM000J and 16 TB ST16000NM002J – did not show a single failure during the quarter. But Backblaze has a relatively small number of these drives in service.

The oldest model in operation is Seagate with a capacity of 4 TB (ST4000DM000), and the company intends to transfer data from these drives to newer and more capacious drives in the next quarter or two. And the longest used instance was the HGST disk with a capacity of 4 TB (HMS5C4040ALE640), which at the end of the second quarter worked for 9 years 11 months and 23 days – now the storage in which this disk is installed is in the process of migration.

The goal of collecting and processing these statistics is to create a failure profile of each drive over time, Backblaze explained, which will help develop replacement and migration strategies. This is illustrated by three diagrams proposed by the company, compiled on the basis of failure statistics for models, copies of which have been in use in the company for a total of 1 million days or more. The first chart shows AFRs for 14 models with an average age of 60 months or less, and the second chart shows AFRs for models with an average age of more than 60 months. This division was chosen because 60 months is a typical warranty period for enterprise-class hard drives.

Drives that fall into quadrant I in the first diagram are characterized as performing well with an AFR of less than 1.5%; in quadrant II – working acceptable with AFR above 1.5%; models in quadrant IV are relatively new, and their failure profile is just beginning to take shape. There were no drives in quadrant III. In the second diagram, quadrant I, as before, represents qualitative models; Quadrants II and III are the “discs we need to worry about”; and in the IV quadrant there was only one model, which does not give cause for concern.

To show the dynamics of failures, a third diagram was compiled. It shows the failure rate over the entire service life of nine models older than 60 months – for clarity, the countdown starts at 24 months. The distribution is predominantly in quadrants I and II, with five of the nine models as of the second quarter of 2024 ending up in quadrant I. Models whose lines are almost vertical (red, brown and purple) show a stable failure rate over time. The blue and gray line models increase their failure rate as they age—the blue line in particular (Seagate ST800DM002) is within normal limits, since its AFR remained around 1% for the first 60 months. The three models that have reached quadrant III have similar profiles – their curves bend more and more to the right as the failure rate increases. Finally, the black line is a Seagate 4 TB drive that is “actively migrating” and being replaced by others.

Leave a Reply

Your email address will not be published. Required fields are marked *