LSI Logic SAS1064ET Raid Controller OS Crash
by bats on Jul.16, 2011, under IBM, Linux, LSI, RAID
This is an issue we’ve experienced on multiple blades with the same model raid controller card. If the drive 0 ever dies, the server/OS crashes, and it refuses to boot claiming the filesystem is corrupted. The only fix is to physically remove the drive 0 and let the server boot off _only_ drive 1. IBM support said this was acceptable behaviour. What’s worse is when IBM sends a replacement for the failed drive 0, and it’s inserted, the controller doesn’t automatically assign it to the RAID group and start the rebuild process while the server is running. You have to take another outage, restart the OS and boot into the LSI BIOS screen and configure it manually. And good luck figuring out how to add a new drive to an existing RAID group/array. I’ve never had so much trouble with any RAID controller on Dell or HP/Compaq. We’re starting to experiment with Boot from SAN, which is great but just works around the issue. Or another workaround would be to use software RAID with Linux. Has anyone else had these issues with LSI RAID controllers on IBM HS22 blades? Google search didn’t turn up any similar results.
Linux to Windows Remote Desktop Issues
by bats on Jun.29, 2011, under Linux, Remote Access, Windows
Many Linux users whose main PC is Linux have to RDP into Windows machines on occasion. All my PCs run Linux, and I’m constantly connected to a Windows box at work, mainly for the Office suite, and the occasional mandatory app that won’t run on Linux. There are a couple issues I’ve encountered and have figured out ways to work around them.
The first and most aggravating is the clipboard between Linux and Windows quits working at random times. I haven’t determined what causes it to happen, but I’ve found that killing and restarting the rdpclip.exe application in Windows restores the functionality. It happens so often that I’ve created a two line bat script and placed it on my start menu and desktop for convenience. This is all you need.
taskkill /IM rdpclip.exe /F
start /MIN rdpclip.exe
I’m not sure if versions of Windows before Windows 7 include taskkill. If not, Sysinternals offers a tool called PsKill that will do the same thing. Just save these two commands in a .bat file and click it when the clipboard quits working. It sure beats the hell out of my previous method, which was logging off completely, and back in.
The second issue is regarding connecting to Windows 7 or Windows Server 2008 machines. Remote to Local sound doesn’t work without adding some extra arguments to the command. This might be fixed in newer versions of rdesktop than my distro provides, so only implement this if it’s currently broken for you. This is how to call it from the command line.
rdesktop -r sound:local -r clipboard -r disk:root=/ servername
The mapping of your local disk to the Windows server resolves the Remote to Local sound issue. There’s another benefit: It also creates a new drive letter on your Windows box that’s mapped to your Linux filesystem. It’s a convenient way to copy files to and from both boxes without having to use another application or protocol. You may want to change the command to “disk:root=/home/user” to make your home dir the highest level directory you can access via Windows. Some of you use a GUI front-end to connect to remote hosts, like gnome-rdp or tsclient. I use gnome-rdp and it doesn’t have an option to map the disk, so I created a wrapper script to do it instead. This is not the optimal solution because it will create problems when you update rdesktop. This is what I’ve done as a workaround.
mv /usr/bin/rdesktop /usr/bin/rdesktop.bin
Create a file called /usr/bin/rdesktop.wrapper and add this to the file.
#!/bin/bash
## Wrapper to fix Win7/2008 sound issues
/usr/bin/rdesktop.bin -r sound:local -r clipboard -r disk:root=/ $*
## EOF
Make the file executable.
chmod +x /usr/bin/rdesktop.wrapper
Then create a symbolic link to the wrapper.
ln -s /usr/bin/rdesktop.wrapper /usr/bin/rdesktop
That should be all you need. Whenever your GUI calls rdesktop behind the scenes, it will actually call our wrapper script, which will prepend the disk argument to the command when calling the real binary. The $* at the end of the line appends all the arguments that the GUI supplies, so that any specific settings or options you’ve set will still work. Keep in mind that when you update rdesktop, the symbolic link we created will be deleted and overwritten with the new rdesktop binary. Simply rename rdesktop to rdesktop.bin and recreate the symbolic link, if the new version doesn’t fix the sound problem itself.
Enable PCOIP on Linux
by bats on Jun.03, 2011, under Linux, Live Video, PCOIP, VMware
Adding PCOIP support to Linux is fairly easy. Download the appropriate package for your distro from the VMware download site, http://code.google.com/p/vmware-view-open-client/downloads/list and install it. Then download HP’s thin client file, and install (extract) it on a windows machine, or use wine. Download the file from ftp://ftp.hp.com/pub/softpaq/sp50501-51000/sp50874.exe. After it’s extracted, the file we need is C:\Program Files\Hewlett-Packard\HP ThinPro\3.2\Add-On\View45\vmware-view-client_4.5.0-293049-1_i386.deb. Transfer that file to your Linux box. Rename that file to vmware-view-client_4.5.0-293049-1_i386.ar, than extract it using “ar x vmware-view-client_4.5.0-293049-1_i386.ar”. It will create a file called data.tar.gz. Decompress and extract that file, and copy these files that it extracted over to the original VMware-client’s installation directory structure.
cp -a ./usr/lib/libpcoip_client.so /usr/lib/
cp -a ./usr/lib/libpcoip_crypto.so /usr/lib/
cp -a ./usr/lib/vmware/plugins/libUsbVMwareView-4.6.so /usr/lib/vmware/plugins/
cp -a ./usr/lib/vmware/plugins/libUsbVMwareView-4.4.so /usr/lib/vmware/plugins/
cp -a ./usr/lib/pcoip /usr/lib/
Afterwards, I had some library issues that were easily solved by symlinks. If you get errors on startup, just perform these commands (if you have them already installed, the 32-bit versions).
mkdir -p /usr/bin/libdir/lib/libcrypto.so.0.9.8
ln -s /usr/libcrypto.so.0.9.8 /usr/bin/libdir/lib/libcrypto.so.0.9.8/libcrypto.so.0.9.8
mkdir -p /usr/bin/libdir/lib/libssl.so.0.9.8/
ln -s /usr/lib/libssl.so.10 /usr/bin/libdir/lib/libssl.so.0.9.8/libssl.so.0.9.8
I know the directories are odd, but I determined by watching the system calls while starting up that it was only checking that location. Don’t worry about adding it to ld.so.conf. That should be all you need to do. Good luck!
** Update ** I placed copies of HP’s files on my webserver in case HP pulls them in the future. You can download the files, even the extracted .deb file, from here.
http://www.offenders.org/sp50874.exe
http://www.offenders.org/vmware-view-client_4.5.0-293049-1_i386.deb
RHEL 6, part 1 – anaconda/kickstart
by bats on Nov.11, 2010, under Linux, RedHat
Just some observations here with the newly released RHEL 6 related to automated installations using kickstart. There were major updates related to anaconda that prevent you from using your standard kickstart configuration files. In particular, these commands were removed altogether and will cause an installation to fail.
* key
* langsupport
* mouse
The package names used inside the %packages section have changed as well. All but one of my existing package names has been removed or changed. There is a bug in the RHEL 6 migration documentation that says you can specify “–defaultPackages” inside the %packages block to “get the exact same set of packages via Kickstart that you would in a default GUI install”, but it doesn’t work. In fact it causes the installation to fail with an error indicating invalid syntax. I will be performing a complete installation shortly which will generate a complete kickstart file which I can use to determine the new commands and package names. Update to come soon!
Qlogic CNAs
by bats on Nov.10, 2010, under Converged Network Adapter, FCoE, Linux, RedHat
So apparently QLogic CNAs version QLE8142 only work with IBM certified FCoE cables. Our standard CNA was QLE8152, and until now we used Cisco certified cables with great success. But when these new CNAs arrived, the vfc showed ‘link down’ and ethernet showed ‘link up’. A swap of the cables, switching to IBM’s certified cables, remedied the situation. Since there is obviously no accepted standard on this technology yet, I guess vendors are free to dictate (and charge) for whatever they feel necessary.
64-bit Linux Flash Player Update
by bats on Oct.04, 2010, under Exploits, Linux
After de-supporting 64-bit Linux this past summer, Adobe recently released an updated Flash plugin for Linux browsers which patches the security hole that firefox warns you about every time you update your browser. Up until now, there was no update available for the vulnerability. Get the new Flash Player “Square” here.
Linux Kernel x86 64-bit 0day Exploit
by bats on Sep.20, 2010, under Exploits, Linux
This exploit is real and it’s been in circulation for 2 years now. RHEL should have a patched kernel available early this week. Details are here.
https://access.redhat.com/kb/docs/DOC-40265.
The exploit is here, but it has a backdoor which can’t be cleared without a reboot. I’d advise not to run this code unless you want to reboot afterwords to clear out the in-memory backdoor.
http://seclists.org/fulldisclosure/2010/Sep/att-268/ABftw_c.bin.
**Update**
Redhat has patched their kernel to protect against this exploit. The new kernel was released on 2010/09/21. The updated RHEL5 kernel is 2.6.18-194.11.4.el5.
Kernel bug affecting Linux and QLGE (and FCoE)
by bats on Aug.25, 2010, under Converged Network Adapter, FCoE, Linux, RedHat
There exists a kernel bug that crashes all RHEL kernels post 2.6.18-164.el5 when using Qlogic’s CNA modules and a malformed frame occurs on the network. Bugzilla article here https://bugzilla.redhat.com/show_bug.cgi?id=600350. It sucks being guinea pigs for new technology but at least they are working to get it resolved. Booting with qlge option qlge_irq_type=1 is a workaround for now.
Update 20110216:
Frustratingly, RHEL still hasn’t fixed this problem. They are testing a patch that still causes a kernel panic, but happens in a different place in the kernel. I guess this patch helps some customers, but not me.
The previous panic occurred at 8021q:vlan_gro_frags. The new panic occurs at multiple places, depending on which kernel they ask me to test. What’s worse is their workaround seemed to be an acceptable temporary solution, but the single 2.6.18-164+ Oracle RAC cluster we built kernel panics fairly often, so we had to go back to the known-good kernel. We’re currently working to determine if it’s related. This would prevent us from upgrading to RHEL 6 until it’s fixed. I’m wondering if anyone else is seeing this same issue. Complete original panic below.
Unable to handle kernel NULL pointer dereference at 00000000000003c8 RIP:
[
PGD 0
Oops: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/local_cpus
CPU 2
Modules linked in: bonding ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh
video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac
parport_pc lp parport sg cdc_ether i2c_i801 usbnet i2c_core pcspkr qlge bnx2
8021q dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero
dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptsas mptscsih
mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-164.15.1.el5 #1
RIP: 0010:[
:8021q:vlan_gro_frags+0x2d/0×215
RSP: 0018:ffff810116cfbe40 EFLAGS: 00010286
RAX: 0000000000000001 RBX: ffff81066c5b4ec0 RCX: 000000000000018f
RDX: ffff810c7ddd3800 RSI: 0000000000000000 RDI: ffff81066c5b4ec0
RBP: ffff81067d17a060 R08: ffff810009000c78 R09: ffff81067ad3dd00
R10: 0000000000000064 R11: 00000000000000c8 R12: 0000000000000064
R13: ffff810c7eec0000 R14: ffff81067d178500 R15: ffffffff803ec2a0
FS: 0000000000000000(0000) GS:ffff810116c991c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000003c8 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff810116cf6000, task ffff8106800760c0)
Stack: ffffffff803ec2a0 ffff81067d17a060 ffff810c7f9a0580 ffff810009017a80
ffff81067ba41c00 ffffffff88317ac3 0000000000000246 ffff810116cfbf0c
ffff81067da4d800 ffff81067d179e78 0000000100000040 00000000000005e2
Call Trace:
[
[
[
[
[
[
[
[
[
[
Code: 48 8b 86 c8 03 00 00 48 85 c0 0f 84 9e 01 00 00 48 83 78 18
RIP [
RSP
Update 20110716
Finally a fix! I was beginning to think we were the only ones having this problem since I officially submitted a case to RedHat 16 months ago with this issue. A big thanks to Jay Vosburgh at IBM who helped track down the issue with the kernel crash related to qlge, bonding and vlans and submitted a patch for a fix. He also wrote a PoC exploit, which I’m sure helped moved this bug along a little faster than its previous progress. I had tried to generate a PoC with packet generators but since I never could see the offending packet (other than Receive error, flags2 = 0x1b), I was just shooting blindly and never found it. Herbert Xu made a slight change to Jay’s patch and hopefully it will be released to the public via an Errata soon so we can upgrade our new datacenter’s hundreds of kernels past 2.6.18-164, re-enable MSI-X, and migrate to RHEL6 and beyond!