User Tools

Site Tools


localization_and_you_-_utf_8_on_netbsd

Localization and You: UTF–8 on NetBSD

NetBSD is a great little operating system, but it’s a much smaller project than Linux. This means there isn’t as much call for better internationalization support, as most of the users and developers are perfectly comfortable with ASCII or the ISO–8859–1 western European locale. This can cause some problems when using software that expects Unicode, also known as UTF–8, also known as the one true text encoding for the future. Here’s how to fix it. These instructions assume you’re using a bourne-compatible shell like ksh, bash, or zsh. If you’re using (t)csh you’re on your own.

Environment Variables

Most of the time, you can “fake” proper UTF–8 support by exporting three environment variables and leaving it up to your local terminal emulator to handle the rest. Add the following three lines to your ~/.profile :

following three lines
export LANG="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
export LC_ALL="en_US.UTF-8"

Save, kill any screen or tmux sessions or other background processes, and log out. When you log in again, you should have a proper UTF–8 terminal as far as most programs are concerned.

The web browser lynx doesn't like it when LC_CTYPE (as seen in locale) has a UTF-8 value. With LANG=en_US.UTF-8 in the environment, locale(1) will set LC_CTYPE to the same, and lynx will display garbage in place of the expected UTF-8 characters. With LC_CTYPE set to another valid value in the environment, e.g., env LC_CTYPE=C lynx, locale will use that, and lynx will behave. (Note: setting LC_CTYPE to something invalid will also work, for in that case locale(1) sets LC_CTYPE to “C” as a fallback.) This quirk is peculiar to the interaction between lynx and NetBSD 9.1; it does not appear reproducible on other systems.

Perl will throw the following error when invoked:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LC_ALL = "en_US.UTF-8",
        LC_CTYPE = "en_US.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

Feel free to ignore this error. As long as you’ve got those environment variables set, you should be fine.

Python 3 expects all source files to be UTF–8 text, so please make sure to change these things before working on python3 code.

Rxvt-Unicode

Rxvt-Unicode, urxvt, rxvt-unicode–256color. By whatever name you call it, it’s a very popular terminal among Linux and *BSD “power users.” Unfortunately, using urxvt adds an extra degree of difficulty to connecting to SDF - there’s no $TERM setting that corresponds with it! I’m sure some of you have tried logging in to SDF from urxvt, only to have scary warnings printed to stderr and have everything treated like a dumb paper teletype. Don’t worry, there’s a very simple fix for that as well. Open up ~/.profile again and add these lines:

add these lines
if [ "$TERM" == "rxvt-unicode" ] || [ "$TERM" == "rxvt-unicode-256color" ]; then
   export TERM="rxvt"
fi

In simple terms, this tricks NetBSD into thinking your terminal is rxvt, the original program urxvt is based on. However, the same volume of home directories is mounted by the OpenBSD machine beastie, which does have an entry for rxvt-unicode in its terminfo database! So if you log in to both systems on a regular basis, and on both systems you use a shell that sources .profile, your OpenBSD experience might be needlessly downgraded. In that case, check that the machine you're logged into is actually NetBSD before exporting the changed value of $TERM.

add these lines
if [ "$TERM" == "rxvt-unicode" ] || [ "$TERM" == "rxvt-unicode-256color" ]; then
      [ "$(uname)" == "NetBSD" ] && export TERM="rxvt"
fi

(The latter code should work in any case, but it's pointless to run 'uname' on every login if you know you'll only ever be sourcing .profile from a NetBSD machine.)

If you have a MetaArpa account, don’t worry - the MetaArray is running Debian, which understands urxvt just fine.

Escape Characters

NetBSD’s terminal has what are called “escape characters.” These are characters in the “high ASCII” (decimal 129–255) range that manipulate the shell session when read from stdin or written to stdout. As you might imagine, this screws with programs that write large amounts of arbitrary characters to standard output, like the “kermit -s” or “sz” file transfer programs. For sx/sy/sz (X/Y/ZMODEM protocols) your best bet is to just not use them with SDF for now. If you’re on a TCP/IP connection (which most of you are) it’s easier to stick with scp/sftp for secure transfers, and http or ftp for insecure. If you really need “in-line” file transfer, there is a way to make “kermit -s” work around NetBSD’s escape characters. This is adding the “-8” and “-0” flags. If I wanted to transfer the SQLite database “winning-lottery-numbers.sqlite” from SDF to my home machine, I would do it like this:

tidux@sdf:~$ kermit -s -8 -0 winning-lottery-numbers.sqlite

Then my local kermit program would receive the transfer and I could continue working on SDF as usual. If you do this often, it may be wise to add an alias in your shell configuration files, like so:

add an alias
alias send='kermit -s -8 -0'

I hope this guide has been helpful to you. Happy UTF–8 hacking!


Localization and You: UTF–8 on NetBSD - traditional link (using RCS)

localization_and_you_-_utf_8_on_netbsd.txt · Last modified: 2022/02/08 23:59 by jquah