Bugs I've run into recently,
2016(-08) edition

I've been dealing with so many bugs recently that I've decided to write them all down.

Linux

I still have to add this to all of my Linux hosts and VM guests to prevent long hangs, which sometimes even take down the host machine:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Linux still doesn't include the BFQ scheduler in mainline, which means that I can choose between the deadline and cfq schedulers, both of which are complete garbage on rotating drives. deadline causes sky-high read latencies, while cfq often reduces throughput to 2-3MB/s. I've sometimes set up servers to dynamically switch IO schedulers based on whether they're trying to push data off the server. [update: I now use sbuild to build ubuntu-xenial kernels with the BFQ patches applied and BFQ enabled by default.]

zfsonlinux (and ZFS) are missing a critical feature that would enable you to defragment your dataset, so performance degrades over time, more severely if your zpool is nearly full.

XFS is a good filesystem with metadata checksums and online defragmentation (using xfs_fsr), but it can't checksum your data. Depending on your disks, SSDs, motherboards, and IO utilization pattern, you may eventually be exposed to data corruption. I use a program that stores file checksums in xattrs and have noticed data corruption caused by aging motherboards and 3TB WD Green drives, but not on a newer system. Checksums in xattrs can be useful for archived media but are useless for detecting corruption in real-time (e.g. as data is read by a database server).

The kernel I'm using today allows anyone to hijack TCP traffic without even being a man-in-the-middle. [fixed]

Server stuff

After you SSH into an Ubuntu 16.04 server a few thousand times (e.g. a loop that does rsync), systemd-logind will break and delay all of your logins by 30 seconds. [fixed]

If your SSH connection hangs or disconnects with a tmux 2.1 or 2.2 running, the tmux window will probably hang forever. tmux 2.2 is the latest tmux release that ships with Ubuntu 16.04 and Alpine Linux and everything else. You can fix it by downgrading tmux. Upgrading tmux to 2.3-master will only introduce more bugs like tmux thinking you're attached to windows that you're not attached to.

LXC no longer works for me. It works fine in a clean install of Ubuntu 16.04, but not on my upgraded Ubuntu 15.10 -> 16.04 machine. After spending half a day bringing my LXC configuration to a pristine state, it was still broken. This was not the first or second opaque LXC failure mode I've had to deal with. Possibly related to this bug involving concurrent starts of containers. I gave up on LXC and switched to VirtualBox, even though I would have preferred to use containers.

VirtualBox locks up one of my VMs about once a week, probably due to a bug in networking or shared folders.

Docker and rkt require root to do anything useful, introducing more attack surface to your server.

The software I use for crawling websites sometimes segfaults under high CPU load, probably due to a memory corruption bug in a Python C extension that is almost impossible to track down.

If you use Cassandra and carefully follow the documented SST upgrade procedure to upgrade from Cassandra 2.2.6 to 3.0, about one in every ~10,000 rows gets duplicated with all of the other columns nulled out in the duplicate row.

Virtualization

VirtualBox Guest Additions can't be installed in Alpine Linux.

VirtualBox Guest Additions don't exist for OS X guests, so video resolution options are limited and mouse behavior is bad.

VMware can run OS X guests, but there's no 3D acceleration, so many applications are broken.

X, x2go, and Qt

There's no real security with X; any running application can sniff your root password as you type it into a terminal. xterm has a "secure keyboard" mode that doesn't work.

vsync doesn't work anywhere with xserver-xorg-video-intel, fullscreen or not. If you enable TearFree, you ~2-3 frames of extra latency and really strange corruption when playing video, at least on an Intel 4790K doing 4K output. Vsync/tear-free operation works fine with the proprietary NVIDIA drivers, so don't get rid of that NVIDIA card if you use Linux.

xserver-xorg-video-modesetting exhibits triangular tearing everywhere on an Intel 4790K, except fullscreen windows. xserver-xorg-video-modesetting also kills Sublime Text render performance. So I use xserver-xorg-video-intel in my primary X session, and occasionally xserver-xorg-video-modesetting in a secondary X session just for watching movies.

Anki frequently segfaults due to a Qt4 bug that won't be fixed. There is a port to Qt5 in progress, but it currently has strange rendering delays.

By default, Qt applications are unbearably slow in an x2go session, taking 4+ seconds to render one frame. This can be fixed for Qt4 applications with export QT_GRAPHICSSYSTEM=native, but this option is gone in Qt5. Qt5 seems to mandate whole-window updates.

When connected to an x2go server that is under load or severe bandwidth pressure, x2go frequently pops up and automatically closes a dialog window telling me that the connection is unresponsive, asking whether I want to disconnect. The answer is always no.

If you don't have an X compositor running, Firefox will always exhibit tearing, even with proprietary NVIDIA drivers.

Chrome

Chrome displays the IME popup in the wrong location when your DPI is != 96, so have fun entering Japanese or Chinese text on a hidpi display. It pops up in the right location in Firefox 47+. [fixed in Chrome 58]

Chrome 52 broke 1:1 touchpad scrolling, scrolling in arrow-key-sized steps instead. It's still broken in Chrome 53, but back to the correct behavior in Chrome 54. [fixed]

The first time you set up syncing in Chrome, you aren't prompted for a sync password, so all of your Chrome bookmarks and passwords land unencrypted on Google's servers. [probably fixed]

For standalone images (loaded in a tab without a document), Chrome 52 and 53 no longer display any part of the image before it's done loading. [fixed]

Chrome doesn't center standalone images, which is a pretty obvious feature that Firefox has had since Firefox 11, released March 2012. [fixed in Chrome 56]

Chrome can't do MRU Ctrl+Tab switching. Firefox does this with browser.ctrlTab.previews -> true. Quick Tabs lets you do this in Chrome, but only with another keyboard shortcut.

Terminal emulators

If you run a command in ROXTerm that instantly pops up or switches focus to another window, ROXTerm will not update the terminal, making it look like the command never exited. Then it will start responding again after a 1-minute (?) timeout.

Konsole implemented output notification indicators, but made the indicator so subtle that you won't notice it.

xfce4-terminal implemented output notification indicators, but resets them after a maximum of 30 seconds. They're reset to a subtle shade of red that you probably won't notice.

Other applications

PulseAudio has a module-equalizer-sink for applying equalization to all programs, but it introduces ~400ms of audio latency and audio glitches.

Thunderbird deletes locally cached mail if login to the mail server fails. Never mind that half the reason to use Thunderbird is to read old mail in your Thunderbird profile. As a workaround, I set up a script that copied in a backup of a Thunderbird profile every time before starting Thunderbird.

Double-clicking to fullscreen an mpv window often results in the fullscreen window landing in the middle of the screen, requiring a few more attempts to fullscreen it. The slightest 1px drag event after clicking seems to cause this.

xfce4-panel's taskbar buttons are tiny on a hidpi display, and there's no option that will fix it.

xfce4-panel turns the whiskermenu icon into a black square every time you drag a taskbar button.

Wine

wine-staging can run Windows applications with GTK3 theming, but enabling this option locks up foobar2000 after about 10-30 minutes. [probably fixed]

foobar2000 in wine-staging is completely unresponsive when scanning a music folder, even though foobar2000 uses a separate thread for this and is responsive on Windows. [possibly fixed]

CJK doesn't work in Wine by default, and neither does font fallback, so you can't use different fonts for Latin letters and CJK. Instead, you have to use one font that includes everything like Arial Unicode MS or Noto Sans CJK {JP, KR, SC, TC}.

Sending audio from foobar2000 in Wine to a module-equalizer-sink results in horrendous distortion, as if the audio samples are arriving in the wrong format.

Steam on Linux

Steam doesn't scale up its tiny fonts on a hidpi display.

Without asking, Steam automatically uploads crash reports with all of your environmental variables and other system information.

On Linux, the Steam updater goes into an endless update loop where it's convinced it's not updated, even though it really is (it just needs to be ctrl-c'ed and started again). Maybe only when extracted to a directory instead of installed from the .deb. [probably fixed]

Windows

If you install or update Windows 8.1, it will sometimes wipe the UEFI boot entries you use for Linux, whether or not your Linux drives are plugged in. The UEFI boot entries are in your motherboard, not on your disks. You might then need to boot from a USB drive and restore your Linux UEFI boot entries with something like:

efibootmgr -c --disk /dev/sdX --part 1 -l 'EFI\grub\grubx64.efi'
efibootmgr -a -b 0000

If you've wondered why Windows and USB sticks can boot without needing an entry in the Boot* variables on your motherboard, it's because they keep a copy of their bootloader in the fallback boot location /EFI/BOOT/BOOTX64.EFI. To avoid needing to fix your Boot* variables with efibootmgr after Windows deletes them, you can run grub-install with the --force-extra-removable argument:

#!/bin/sh

# Runs grub-install with the correct arguments for UEFI systems that do not
# have more than one operating system per EFI system partition.
#
# Note: grub-efi-amd64 must be installed.

set -e

# On a dual-boot system, Windows may remove our Linux boot entry from the
# NVRAM Boot* variables after a Windows install or a Windows Update.
# Use --force-extra-removable to ensure we have a copy of grubx64.efi at
# the fallback boot location /EFI/BOOT/BOOTX64.EFI used by almost all UEFI
# motherboards.  See 3.5.1.1 Removable Media Boot Behavior
# http://www.uefi.org/sites/default/files/resources/UEFI%20Spec%202_6.pdf
grub-install \
    --target=x86_64-efi --efi-directory=/boot/efi \
    --force-extra-removable --bootloader-id=Linux --recheck

ClearType is almost wholly unsuitable for hidpi displays, rendering overly-thin stroke widths that discontinuously jump in width based on the font size. If you try to fix it by injecting MacType into all of your running programs, you completely break Chrome's text rendering. And compared to freetype, it still looks kind of awful.

NTFS performance is bad, especially with large directory trees, compared to XFS or ext4 on Linux.

Windows intentionally RSTs all of your TCP connections when you unplug your Ethernet cable or reset your network device, even though the connections would have survived if Windows just didn't do that!

Windows won't let you delete an open file unless every application opened it with the FILE_SHARE_DELETE share mode. And most applications don't, so typically when you want to delete or rename an in-use file, you have to waste time closing programs. When you can't figure out which program is using it, you have to use Process Explorer's "Find Handle or DLL" feature to track down which process has it open.

Windows 8.1's non-TPM BitLocker boot screen often doubles up keystrokes (!!), so you never enter your password correctly unless you Backspace the doubled-up keystrokes as needed. The password is obscured by default, but you can press Insert to see the characters, barely, because the text is tiny and the BitLocker screen is at 640x480 resolution.

Windows 8.1's BitLocker boot screen gives you around 20 seconds to type in your password before it powers off the system, whether or not you are still busy typing. Sometimes, for unknown reasons, you have just a few seconds to enter the password, at least until you power off the system and keep it off for a short while. You can turn off this timeout with bcdedit /set {bootmgr} bootshutdowndisabled 1

Windows 8.1's BitLocker boot screen gives you even fewer seconds to type in your password after it powers off due to a timeout. Eventually, your only option is to enter your BitLocker backup key instead of your password. Selecting the option to start entering the backup key sometimes disables the timeout.

In some cases (e.g. when you turn DEP on or off system-wide), Windows has to temporarily disable BitLocker, which if you are using without TPM, probably involves writing an unprotected encryption header to your drive.

Screenshot showing: In order to change the system-wide dep setting, BitLocker needs to be suspended. If applied, you are advised to reboot at your earliest convenience. Are you sure you want to make this change? [Yes] [No]

Windows handles font rendering in the kernel, leading to very serious security bugs.

Windows 10

Windows 10 automatically installs drivers for all of your hardware, even if they're for things you don't really need. This happens automatically when you install Windows 10, before you have a chance to configure anything. I had some ASUS junk get installed to Program Files with no uninstaller.

Windows 10 removed your control over which updates get installed. You get all of them. Enjoy. (You can still use a separate program to "hide" updates before Windows Update learns about them.)

Windows 10 introduced serious audio stuttering problems on some hardware, particularly when there is network activity. There is some kind of bug in ndis.sys or Intel's Ethernet drivers that results in very high DPC latencies. Someone says bcdedit /set disabledynamictick yes and running TCP Optimizer with the Optimal profile might help.

LatencyMon showing ndis.sys causing a 'highest reported DPC routine execution time' of 280 milliseconds

Windows 10 requires you to use two different control panels to configure your system.

Windows 10 Telemetry can't be disabled on Home or Pro. Even on Enterprise, the lowest level you can set is "Security", and it's unclear whether that stops anything from getting sent to Microsoft (probably not).

Windows 10 has too many other problems to list here.

Chrome OS

You can't do anything to lower the color temperature on a Chromebook except install Linux, which may or may be possible with your hardware.

DisplayPort monitors

If you use a ViewSonic VX2475SMHL-4K, plugged in with DisplayPort to an NVIDIA card, on Windows 8.1 or Ubuntu 16.04, turning the monitor off and back on will result in the monitor coming back at 30 Hz instead of 60 Hz. On Windows, you can fix it by rebooting! On Linux, by restarting X.

If you use a ViewSonic VX2475SMHL-4K on Linux, plugged in with DisplayPort, turning the monitor off and back on will result in you missing a monitor forever, until you run xrandr --auto. The solution is to never turn off the monitor and rely on DPMS sleep instead.