Solid state drive
Before deploying any solid-state drive, particularly in critical environments such as servers, network-attached storage devices, or production workstations, it is essential to carry out thorough preliminary testing. Although manufacturers perform rigorous quality control procedures at the factory, including firmware validation, flash memory testing, and controller calibration, problems can still arise after production. These may include early-life component failures, physical damage during transport, or hardware compatibility issues.
Passing all preliminary tests does not guarantee that the drive will not fail. However, it provides a reasonable degree of confidence that the device is not already defective or likely to degrade immediately. Testing confirms that the drive is in a healthy condition at the time of evaluation, but as with all hardware, it could fail shortly after deployment, several months later, or continue operating reliably for many years.
Step 1: Check SMART Results
The first step is to examine the SMART (Self-Monitoring, Analysis, and Reporting Technology) data reported by the SSD. Although different manufacturers expose different sets of attributes, most modern drives include the key indicators necessary to assess general health, usage, and remaining lifespan. The following parameters are especially important:
Power-On Hours (POH): Indicates the total time the drive has been powered on. A very high number on a new or recertified drive may suggest prior usage or ageing.
Total Bytes Written (TBW) or Host Writes: Shows the amount of data written to the drive over its lifetime. This figure should be compared with the manufacturer's rated endurance.
Media Wear Indicator or Percentage Used: Reflects the amount of wear sustained by the NAND flash memory. A new drive should report minimal wear, or one hundred percent remaining endurance.
Reallocated Block Count: Indicates how many flash blocks have been marked as bad and replaced by spare blocks. An increasing count may signal that the flash is beginning to fail.
Uncorrectable Errors and CRC Errors: Highlight data integrity issues or transmission faults. Repeated or growing numbers can point to physical damage, controller failure, or cable problems.
Temperature History: Consistently high temperatures, often above 60 to 70 degrees Celsius, can degrade flash memory more rapidly and reduce overall endurance.
For solid-state drives, one of the most important indicators of overall health and remaining lifespan is the Total Bytes Written (TBW) metric. This value, which can usually be found in the SMART data under names such as “Host Writes” or “Total NAND Writes,” shows how much data has been written to the drive since it was manufactured. To assess how heavily the SSD has been used, compare the reported TBW figure against the endurance rating specified by the manufacturer in the drive’s technical spec sheet. This endurance rating may be expressed in two ways: either directly as a TBW figure (e.g. 300 terabytes written) or as Drive Writes Per Day (DWPD) across a given warranty period.
If DWPD is used, calculate the total expected TBW by multiplying the DWPD value by the drive’s capacity in gigabytes, the number of days in the warranty period, and then dividing by 1,000 to convert to terabytes. For example, a 1 TB SSD with a 0.3 DWPD rating and a five-year warranty would be expected to sustain:
0.3 × 1,000 GB × 365 days × 5 years ÷ 1,000 = 547.5 TBW
Comparing the actual TBW reported in SMART to this number allows you to estimate the percentage of endurance already consumed:
(Actual TBW / Expected TBW) × 100 = % endurance used
Step 2: Run a Long SMART Test
After reviewing the SMART attributes, the next step is to initiate a long SMART self-test. This test is managed internally by the SSD's firmware and can be launched using tools such as smartctl
or the diagnostic software provided by the manufacturer.
A long SMART test performs a deep internal scan of the drive, checking areas such as the flash controller, onboard memory, and block-level data correction mechanisms. While it does not write across the full storage area, it is capable of detecting background faults that might not yet have surfaced during normal operation.
The test usually completes within thirty minutes, although this depends on the model and capacity of the SSD. Record the results of this test, as they will be useful for comparison after further testing. Any errors that occur during the test should be taken seriously, even if the drive appears to function normally.
Step 3: Surface Test (Write and Read Validation)
Once the long SMART test has completed successfully, a full surface test should be performed to validate the integrity of the entire drive. This test is designed to write data across all accessible blocks of the SSD and then read it back to confirm accuracy. It is useful for identifying weak or marginal NAND cells that may not fail immediately but could become unreliable under sustained use.
Hard Disk Sentinel (Professional Edition) is recommended for this task, as it provides a graphical interface, real-time feedback, and a visual map of slower or failing regions.
When running the surface test, configure it to use a sequential write followed by a read verification cycle. Use a pattern such as 0xAA 0x55 0xAA 0x55
. This binary pattern alternates each bit and is especially effective for detecting stuck bits, stressing internal write circuitry, and revealing failures that simpler patterns such as all-zero or all-one values may not expose.
The test also prevents cases where the drive controller, or a connected host bus adapter, might ignore trivial patterns like zeroes, which could result in incomplete testing. During the process, the software will track the response time for each block and indicate slower or problematic areas using a visual gradient.
Step 4: Run Another Long SMART Test
After completing the surface test, perform a second long SMART self-test. This allows you to assess whether the stress introduced by the previous full write and read cycle caused any changes to the drive's internal state. SSDs often silently reallocate blocks or adjust internal error correction thresholds when exposed to high write activity, and this may now be visible in the SMART data.
Pay close attention to attributes such as:
Reallocated Block Count
Current Pending Block Count
Uncorrectable Error Count
Total Bytes Written
Temperature or Throttling Indicators
Compare the results with the data recorded during the first SMART test. Any increase in error counts or signs of accelerated wear should be considered a warning. Even minor changes may indicate a reduced lifespan or marginal reliability. If no changes are observed, you can be reasonably confident that the SSD is functioning correctly under stress and is ready for deployment.
Last updated