UPDATE: given that I have now upgraded the motherboard, and it now works, these problems are now rendered moot. However for those interested, that eliminates the HDD and IPFire out as the problem source. My guess is that it was caused by the 3512 card not working correctly, be it in hardware or software.

Ok so maybe buying a whole new router was a little rash given I have a problem that could potentially be solved by other means.

Here’s the situation and the problem:

The Players:

Linux box running IPFire with the following specs:

CPU: Intel Pentium 4 1.6 Ghz

RAM: 512MB generic DDR (i think)

PSU: Generic 350w or something of the like

MOBO: Intel 800 Series something-a-rather

HDD: Western Digital WD1600 160GB IDE

MEDIA HDD: Western Digital WD20 2TB SATA2

NET: 2x TP-Link 1Gbps Ethernet PCI cards (1 for LAN, 1 for WAN) running CAT6 cabling

SATA Card: Ritmo PCI to SATA1 w/Silicon Image Sil3512 SATALink chip

Notes:

  • The media drive is new, there’s no SMART errors, no issues hardware related from what I can tell.
  • It is connected via a PCI Silicon Image 3512 SATALink 1.5Gbps SATA1 controller card. Powered by a molex –> SATA power converter.

The Problem:

I can successfully copy files via any method to the media drive, however I cannot read the resulting files. The transfer usually goes on for a second or so at full speed and then suddenly drops to <100KB/s and finally dies with an error message. I’ve tried multiple files, all fail.

I have finally found some useful information about the errors im getting in relation to ata2.00: error: { ICRC UNC IDNF ABRT }

(see below for full kernel message)

According to https://ata.wiki.kernel.org/index.php/Libata_error_messages

  • ICRC = Interface CRC error during Ultra DMA transfer (which is likely caused by power issues or bad driver instructions)
  • UNC = Uncorrectable error (bad sectors)
  • IDNF = Requested address was not found
  • ABRT = Command aborted

Looks to me that the first error causes the others. But then again who knows, I could just be that unlucky.

 

18:34:18 Kernel.Info	kernel: ata2: EH complete
18:34:18 Kernel.Info	kernel: ata2.00: configured for UDMA/33
18:34:18 Kernel.Info	kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
18:34:18 Kernel.Info	kernel: ata2: hard resetting link
18:34:18 Kernel.Error	kernel: ata2.00: error: { ICRC UNC IDNF ABRT }
18:34:18 Kernel.Error	kernel: ata2.00: status: { Busy }
18:34:18 Kernel.Error	kernel:          res ff/ff:ff:ff:ff:ff/ff:ff:ff:ff:ff/ff Emask 0x2 (HSM violation)
18:34:18 Kernel.Error	kernel: ata2.00: cmd 25/00:00:1f:03:b6/00:01:3e:00:00/e0 tag 0 dma 131072 in
18:34:18 Kernel.Error	kernel: ata2.00: failed command: READ DMA EXT
18:34:18 Kernel.Error	kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
18:34:18 Kernel.Debug	kernel: ata2: drained 32768 bytes to clear DRQ.
18:34:17 Kernel.Info	kernel: ata2: EH complete
18:34:17 Kernel.Info	kernel: ata2.00: configured for UDMA/33
18:34:17 Kernel.Info	kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
18:34:17 Kernel.Info	kernel: ata2: hard resetting link
18:34:17 Kernel.Error	kernel: ata2.00: error: { ICRC UNC IDNF ABRT }
18:34:17 Kernel.Error	kernel: ata2.00: status: { Busy }
18:34:17 Kernel.Error	kernel:          res ff/ff:ff:ff:ff:ff/ff:ff:ff:ff:ff/ff Emask 0x2 (HSM violation)
18:34:17 Kernel.Error	kernel: ata2.00: cmd 25/00:00:1f:d0:b5/00:01:3e:00:00/e0 tag 0 dma 131072 in
18:34:17 Kernel.Error	kernel: ata2.00: failed command: READ DMA EXT
18:34:17 Kernel.Error	kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
18:34:17 Kernel.Debug	kernel: ata2: drained 32768 bytes to clear DRQ.

Oh and here’s my mtab entry:

/dev/sda1 /mnt/media xfs rw,noatime,allocsize=512m,logbufs=8 0 0

What I’ve tried:

  • Rebooting the machine (lol)
  • Copying via SCP over SSH, RSync, SAMBA
  • Reseating the SATA controller card
  • Using a different PCI slot
  • Changing the SATA cable over
  • Reseating the Molex -> SATA power connector
  • Using the other port on the controller
  • Using the OPT1 jumper position on the HDD ensuring SATA1 compatibility (worked without it anyway)
  • acpi=off noapic options in GRUB
  • Using a non smp kernel
  • Reformatting
  • Recopying all media files (1.4TB .. .took bloody ages too)
  • Formatting as EXT4
  • Formatting as XFS (and this is the current file system)
  • Updating the controller cards’ BIOS chip (didn’t need it)

Causes?

To be honest, I’m at a loss.

The only possiblities that remain (that I can think of atm) are:

  1. the modified IPFire kernel doesn’t support what I’m trying to do.
  2. the 3512 controller isn’t properly processing DMA commands
  3. lack of power from the PSU (though I can’t imagine why that would only prevent reads)
  4. old age/generally incompatible
  5. motherboard failure
  6. HDD failure (highly unlikely + I have had no issue reformatting and SMART doesn’t report any issues)

 

Fixes?

Cheap options:

  • Check media drive with SpinRite on another machine, ensuring all sectors are clean and not ‘bad’
  • New controller card
  • If not, new power supply

Expensive options:

  • New machine using old hardware (just need CPU/Power Supply -the rest I already have) $76
  • New machine (excluding the case), upgraded hardware, gutsier processor, integrated LAN and SATA2 ports.  $148 (cheapest AMD) to $184 (best Intel)