Drivers Marvell 88se9128
After no response on the OpenSUSE forum (here: ) I am posing this query to Unix Stackexchange in the hope of wider reach. So: I have a Dell T20 running as a Home Server on OpenSUSE 64Bit - 1 500GB drive for the OS and 4 3TB harddrives with RAID6 for storage. As the Dell only has 4 on board SATA ports, it had to be expanded - in my case with a StarTech (PEXSAT32) 2x SATA3 card using a Marvell 88SE9128 chipset. (Weirdly OpenSUSE reports it as 88SE9123) The issue I face is that the card behaves erratically, dropping a drive from the array. (though it has added it correctly in the past, it has been an on and off issue for a good year.) Tech support suggested testing it outside of the array - zero filling the drive results in a failure at different times, anything from 1.5GB to 10+GB - but the drive gets dropped, this is true for the original 3TB drive as well as a 2TB drive.
(Both drives are fine.) The zero fill will start at around 54MB/s and then starts to drop, often dying at 10-20MB/s, but I have seen it as low as 500KB/s. Another suggestion was to try another computer - my desktop unfortunately only runs Windows, but testing the 2 and 3TB drive with 8/16GB CrystalDiskMark in a sequential write gave me no failure and resulted in around 190MB/s read speed and 150MB/s write speed, as expected for those harddrives (and the speed the array gets during rebuilds on the internal Intel SATA ports). Unfortunately tech support could not offer any further suggestions, however the Windows test suggests the card itself is fine. Swapping PCI slots on the Dell made no difference either.
List of Marvell Technology Group chipsets. Workaround for linux and Marvell's chips that are not in the pata_marvell driver. (88SE9128) with AMD 790FX.
While I have tested swapping the SATA cable, the same cable worked fine on my desktop, so I doubt it is the problem. It is my understanding that the card should just work - as the controller is supported by Linux and should hence just work, unfortunately it does not. Any ideas, any suggestions? How can this be investigated further? (This is where a forum would be better I guess.) I am aware that many consider Marvell chips to be evil, but unfortunately I cannot afford a several hundred Pounds RAID card (nor do I really need it as I am using software RAID).
Another SATA card that I have with a 3123 SiI chip is not recognised by either the Dell or my desktop. (could be a compatibility issue or dead card.) Updates following the suggestions from chanik: (25th Nov 2015) I used the recommended command, echo 1 >/sys/block/sde/device/queue_depth to set the queue depth to one, using cat /sys/block/sde/device/queue_depth I verified that it was set to 1 (whether the setting was respected is another question). In either case, using dd to zero the drive or actually a partition on the drive fails. Following some further comments I reran the test - just in case something funny happens with the controller with dd, I created a fresh GPT table on the drive with a brand new ext4 partition spanning the entire drive and then copied a large directory to the drive. (Failed in both cases, but weirdly enough lived for 48GB with NCQ and 180GB without NCQ this time.) For troubleshooting I copied/collected the output in /var/log/messages after the error occured and for NCQ set to 1, I also dumped the dmesg output to a logfile after the error occured. (Text hsoted on Pastebin) • Default NCQ: • NCQ Set to 1: • dmesg output: ->if I read the dmesg log correctly, this may suggest that NCQ=1 was not respected dmesg after reboot and after manually setting ncq to 1, looks like it is truly not respected. • After Reboot: • Setting NCQ: Edit 2 - 25th Nov 2015: Decided to use libata.force=noncq via Kernel parameters - still failed.
Dmesg output at Edit 26th Nov 2015: With some suggestions to try an older Kernel as well as Windows I tested a fresh install as well as updated OpenSUSE 13.2 as well as a fresh install of Windows 7 (this one 32Bit, Linux is 64Bit) with the Marvell driver. Free Download Sound Driver For Dell Optiplex 790. Under Linux copies or writes failed - be it to the drive itself or to the NTFS partition from Windows, under Windows the drive benchmarked fine.
OpenSUSE 13.2 fresh: OpenSUSE 13.2 updated: OpenSUSE 13.2 updated no NCQ: Copy to Windows NTFS partition under Leap 42.1 Kernel 4.1: Windows 7 CrystalDiskMark benchmarks: I also cannot understand why the chip is seen as a 9123, both in Windows as well as Linux - the chip clearly says 9128: Edit 2 - 26th Nov 2015 Ran Ubuntu from a USB drive with Kernel 3.13 (with ncq) - failed again • dmesg: • lshw: Edit 27th Nov 2015 Ubuntu LiveUSB with Kernel 13.3 again - setting ncq via the command line, failed again. • before setting ncq: • after setting ncq: • after failed zeroing of drive: Update 2nd Dec 2015 Little add on notes - got a new card with an Asmedia chip which works well. Before I switched in the new card I did some more tests with an older harddrive I had lying around which I may add to my server for non-RAID duty. The old drive is a SATA2, 2.5' harddrive with an advertised 120GB, well, no change. - also searched around a bit and disabled the write cache. Still failed. As usual I collected dmesg output, one while it was running prior to failure in case someone is interested and otherwise after failure.
As usual, files hosted by Pastebin. • Just running dd with defaults: • dd while it was running: • dd with noncq set manually via command line: • dd with noncq set manuall and no write cache via hdparm: And if I switch the BIOS to support legacy mode on my Dell T20, I can enter the card's menu too - photo below. The only option it actively gives you, is to create a RAID 0/1 array in there or leave it as it is. Edit May 2017: This problem was never truly resolved and the Kernel mailing list didn't provide any real answers either. As a result, the SATA card was changed to a different model and everything seemed to work.
Fast forward, throughout 2016, two drives failed - and apparently both are truly dead. All drives in my home server were switched and during the process I discovered that at least one of the SATA cables was faulty - in this case the OS drive cable. I also replaced the SATA cable for the card and things seem to work. This makes me wonder whether all the problems were due to a faulty cable and worse error correction under Linux.
I won't know as I don't really have a good way of testing whether this would resolve the issue. However, as a maybe: If irrational behaviour occurs, maybe try buying new cables, maybe that will solve the issue. I have Asus P7P55D-E EVO board with 88SE9123 onboard controller and experienced erratic behaviour on the HDDs attached to this controller. Simple workaround is disabling the NCQ on the ports of 88SE9123 by adding something like following line on /etc/default/grub file.
GRUB_CMDLINE_LINUX='libata.force=7.00:noncq,8.00:noncq' By adding this and executing following commands I could get the kernel parameter in grub.conf modified in Ubuntu Linux. $ sudo update-grub $ sudo grub-install /dev/sda I've no experience on OpenSuse so you have to figure out how to change the kernel parameter for boot time in your distro. Install Ilbc Codec Asterisk. As an immediate cure, you can disable the NCQ of specific HDDs as follows. $ sudo -i # echo 1 >/sys/block/sde/device/queue_depth # echo 1 >/sys/block/sdf/device/queue_depth This command takes effect immediately however it's not persistent on rebooting so you'll eventually have to change the boot parameters. References • • I'm still searching for the real solution for this problem but no success yet. I hope this workaround works for you.
I'm running Linux on two PCs of Asus P7P55D-E EVO board, Ubuntu 14.04 and CentOS 7 respectively. In both PC, two HDDs attached to 88SE9123 ports and used for OS booting with btrfs RAID1 setup. Same 88SE9123 issue on both PC and same workaround worked on both. I tried various things such as [1] updated the HDD firmware [2] updated the mainboard bios expecting onboard 88SE9123 bios also updated [3] forced SATA link to 3Gbps instead of 6Gbps [4] disabled the VT-d on mainboard bios setting, with no luck.
The only cure worked so far is to disable the NCQ. Very stable without NCQ. – Nov 25 '15 at 0:12.