Why you’re wrong if you think spaces are better than tabs

OK, time to finally throw my hat into the political ring.

But I want to state for the record as a disclaimer that I do so not based on opinion, but merely based on the facts:

Tabs are better than spaces.

Now, quiet, you python people. You’re just wrong. I know that your style guide PEP says you should use 4 spaces.

But it’s wrong.

Now, you’ve all heard the old unix greybeard argument about how your files will be 2% smaller if you switch to tabs instead of spaces, because you’ll use 1 tab character rather than 4 space characters. While this argument is correct, it has nothing to do with my argument (it’s just another benefit of using tabs, as far as I’m concerned). But a small saving in file size isn’t a reason to change how you do things.

The reason you should change how you do things and start using tabs instead of spaces is simple: it’s the correct answer. But that’s not actually my primary reason. My primary reason is that it’s better.

Now, this might sound arrogant or whatever, but allow me to explain what’s actually going on under the hood, and how you can configure your editor correctly and we can all live in peace and harmony and never worry about this whole indentation thing ever again.

A lengthy treatise about the history of text (and how it’s indented)

A long long time ago – even before Nirvana – there were mechanical typewriters. That’s where the tab key comes from, since our computer systems were originally used with teletypes, which were based on typewriters.

But typewriters didn’t just have a tab key – they also had tab stops – a bar along the back of the typewriter with several movable latches which allowed you to set the tabs at any position you like. The behaviour of the tab key and tab stops in a WYSIWYG word processor emulates this pretty faithfully (though it is a superset of typewriter functionality, e.g typewriters had a limited number of tab stops and afaik could only do left-aligned tab stops).

When we started using teletypes and terminals, we were originally using fixed-width (i.e the screen was typically 80 or sometimes 40 characters wide, and used a monospaced font) text-only monochrome displays. And back in the 60s IIRC the ASCII standard was developed as a descendant of the baudot code used on telegraphs.

This standard defines a bunch of characters, and a bunch of control characters. If you’re familiar with ASCII or unicode at all you’ll recognise some of them. Some common examples:
character 32* – space
character 10 – linefeed
character 13 – carriage return
and character 65 – an uppercase “A”.

(* i tend to think in decimal, these are decimal values. All ascii values here should be decimal for consistency)

If you’ve ever played with colours in your terminal prompt, you might also recognise escape as character 27.

There are a bunch of these available, and you can see the full list with a simple ‘man ascii’ (assuming you have the relevant packages installed, apt-get install man-pages should do it on debian).

In this table, we see my beloved tab sitting at position 9. And you’ll also see one that you probably haven’t used before – character 11 – “vertical tab”.

All of these things are there for a reason, even though we almost never use some of them (like vertical tab) today.

There are a few intricacies of the ascii table which aren’t mentioned or immediately obvious from reading the man page I pointed you to. They’re a little more obvious if you look at a 4-column ascii table with the hex and binary values (<-- I'd encourage you to open that in a new window so you can look at it while reading this lengthy tome).

With this layout, it becomes more obvious that the first 32 ascii characters are in a special class that you probably already know about - these are the control characters. There is one other control character which is outside of this range and a special case - 127 / DEL.

Less Obvious is that this pattern of categorising the ascii table into sets of 32 applies for all four columns. The ASCII table was intended to be broken up this way: WE have four broad categories of characters here: control characters, symbols and numbers, uppercase, and lowercase.

Note another correspondence when we break the ascii table up in this way: the lower word (i.e the last 4 binary digits) are the same for each character for both uppercase and lowercase - we can think of the upper word / first four bits* as a "mode selector" to select between columns on this table, and the lower word selects one of the rows, giving a particular character.

(* in reality it's only three bits in the upper / most significant word, because we're only talking about 7-bit "pure" ascii today, but I'll be referring to them as two 4-bit words here to make things clearer - the most significant bit is always 0 for our purposes today)

This idea is modelled on an earlier code (baudot? something else? the history is long) and is in turn modelled on typewriters and how the shift key worked: On a mechanical typewriter, the shift key worked by physically shifting the printing mechanism or head (versions differed), and each "letter-stamper-thingy" on the typewriter had two characters - uppercase and lowercase (the names of which in turn come from a printing press operator's two cases of letters - uppercase tended to be used less often, so the operator would place it in the upper position, further away from his working area) - and depending on the position of the shift mechanism, selected between the two characters, giving each normal key two functions. Similarly, the number keys had symbols as their "uppercase character".

This design characteristic makes it pretty easy electronically to implement this "shift" mechanism for most of the keys on your keyboard without any special logic to handle upper/lowercase - each key has an encoded 4-bit value, and depending on the state of the shift key we set or unset bit 3 of the upper word (it's a little more complex than this these days, e.g capslock).

And that's why teletypes were fairly common already by the time computers were invented - they're a lot simpler - the character table is designed to make it easy electronically.

But it doesn't stop at the keyboard, it's also easier to interpret on the decoding end: if your bit 3 is set, you want to select a lowercase glyph. This is a very easy test that can be done with few logic gates, and in very few instructions on most(all?) computer processors.

So this meant that when computers came around, and we wanted a system to have them represent text and interact with keyboards, adopting this table made a lot of sense due to the slow speed of those early machines - efficiency was everything. And so ASCII was born - people took clever ideas of their predecessors and expanded on them.

You'll also notice that in this layout, the symbol characters between the uppercase and lowercase and at values >=123 make more sense – if you’ve ever looked at an ascii chart and wondered why e.g the symbols or letters weren’t all in one contiguous region, this is why!

(Today, we’re not technically using ASCII anymore – these days, all modern operating systems use unicode. But unicode takes this compatibility thing that ascii did even further – you may know that unicode is byte-compatible with 7-bit ascii, so a pure ascii file (and most english text from other similar encodings, e.g iso-8859-1, too) is also a valid, identical unicode file)

So far we’ve only covered columns 2-4, but a simple glance at our ascii table shows that column 1 is special. And you already know why: none of these are printable characters – except, debatably, tab.

You probably know about nonprintable characters – unicode means that most computers have lots and lots of them today. But you might not know the distinction between a printable / nonprintable character and a control character. And that’s what this column actually is – these are the control characters, not the nonprintable characters.

There is one other control character – DEL – which doesn’t live in this column. I’m not sure where it’s position at the end of the table originated and how that decision came about. But this is also relatively easy to test electronically – a 7-way AND gate on your 7 bits, and in code. Putting it at the end of the table like that makes it a relatively simple exception that you need to accommodate.

They’re control characters because this encoding was invented to provide all the functionality of all the various teletype machines out there, providing “one encoding to rule them all”, which should be able to work with any teletype, providing interoperability.

Teletype machines needed to have a way to signal to each other that this should be the end of the line, for example, and so you have a linefeed character. Today you might think of a linefeed as “just another character”, but the term “control character” isn’t just a pretty name – in it’s original intent, “linefeed” is not a character but an in-stream instruction for the receiving device, which means “move the physical roller which controls the vertical position of the physical paper in the actual real world one line down”. Presumably on some teletypes it also meant “…and return the physical IRL print head to the first column”, and on some it didn’t. In order to support all the features of all the teletype machines out there, a bunch of control characters were needed.

No, I have no idea what half of them do, either.

I do know about a couple that you may not have heard of. For instance, there’s the one that I call “EOF” – end of file, but which the ascii table lists as “End Of Transmission”, at position 4. Unix implements this as it’s “End Of File” character – this is what your terminal sends down the line when you press CTRL-D. It’s why you can press CTRL-D to log out of your terminal. It’s also why you can do

$ cat - > /tmp/foo (enter)
foo(enter)
bar(enter)
(ctrl-d)
$ cat /tmp/foo
foo
bar

to create a file which includes linefeeds from the unix prompt, using cat to read from stdin and then using ctrl-d to send the the end-of-file character to tell the system that you’re done inputting data.

A more commonly known one due to a decision by microsoft to be contrarian is the difference between a linefeed (“move 1 line down”) and a carriage return (“return the carriage (or cursor) back to column 1″). Technically microsoft’s preference of doing both a carriage return and linefeed is perhaps more historically accurate, since in almost all cases you would want to do both of these things when the enter/return key is pressed, whereas unix says that a linefeed implies a carriage return, and interprets carriage return as “*only* do a carriage return, not a linefeed”, meaning that on unix CR allows you to “echo over” the same line again, and that means you can draw bar charts in bash using echo -e “\r$barchart” in a loop.

I member a time when *nix used LF, Windows used CR + LF, and macs used CR just to be totally goddamn annoying. Apple adopted LF along with unix with the advent of Mac OS X, so that’s not a thing anymore unless you’re into retrocomputing.

You may have seen the good old ^H^H^H^H^H^H joke, where a person is deleting their code. This is because the backspace character/key at position 8 was traditionally mapped to CTRL-H, which could render on some terminals visibly as ^H rather than a backspace depending on a ton of hardware variations and compatibility settings on the terminal you were sitting at and the terminal you were talking to.

CTRL-L clears the screen on *nix because it’s mapped to the form feed character at position 12. Likewise CTRL-C is mapped to character 3 (end of text, i’ve always called it ‘interrupt’). I believe that the dreaded CTRL-S and CTRL-Q to freeze/unfreeze output on your terminal are mapped to control characters, too, but I couldn’t tell you which ones.

There’s also a fun one which doesn’t appear to be mapped on my modern linux machine – CTRL-G, to ring the terminal bell.

These control key sequences exist because when people started using different terminals to talk to unix systems, they quickly found that not all terminals were the same. E.g not all of them had a ‘backspace’ or a ‘clear screen’ key, but all of them had some kind of “control” or “modifier” key, so the control sequences were added for people who didn’t have the corresponding key. To this day, I have a ‘compatibility’ tab in my terminal which allows me to tell the terminal to send a CTRL-H key sequence for backspace, amongst other things.

A short aside:

As I’ve demonstrated above, one of the pitfalls that we find ourselves running into on modern unix systems is that by the time you get to a terminal emulator in your gpu-accelerated, composited GUI, you’re running many layers of abstraction and compatibility deep: Your terminal is emulating and backwards-compatible with VT100 dumb-terminal hardware from perhaps the 1970s, patched to be able to support unicode, which is itself a backwards-compatible extension on top of the backwards-compatible extension of a previous code that is ascii, going all the way back to bardot and the telegraph in the late 1800s. So, no, it’s not as straightforward as you’d expect to write code to say “move the cursor to position x,y” on a unix console.

This causes us a bunch of problems and causes us limitations on modern desktop unix systems perhaps more often than it helps the average user. If you read the unix-hater’s handbook, you’ll find an entire chapter on how /dev/tty and the terminal emulator is the worst thing in the entire universe. This is generally acknowledged as one of unix’s “foibles”.

So why hasn’t anyone done anything about all that legacy stuff?

Because one of the joys and beauties of unix is the deeply-ingrained principles of backwards compatibility and portability that came to embody the unix philosophy over the course of decades. Which means that I can still (relatively) easily connect my modern terminal emulator up to an antique teletype and have it be compatible to a pretty decent extent.

This is an important quality of unix. It’s important to keep these open, compatible standards around for the purpose of the preservation of information. If we had moved from ascii to an incompatible standard, we would have had to convert every single document ever written in ascii into that new standard, or potentially lose the information as the old and incompatible ascii standard became more and more rare and unknown.

And if you search youtube, you can find people hooking modern systems up to antique teletypes. For my money that makes it all worth it.

But finally, Let’s talk about tab.

Note that space is up at position 32, in column 2 with the printable characters. I’ve seen space categorised as a nonprintable character, but this is the wrong way of thinking about it. A better way is to think of space as a fully black glyph on an oldschool fixed-width text terminal (regardless of whether or not it was actually implemented this way). You want a space character to erase any pre-existing character at that position on the screen, for example. And you want that “move on to the next screen column with each keypress, so that the user can type left-to-right” functionality that you get from making it a fully-black glyph.

For example, in bash:

echo -e "12345 \r     67890"

doesn’t give you the output:

1234567890

it gives you:

     67890

- the spaces erase the previously-printed characters.

Space is a printable character.

Tab is a control character.

I was tempted to write “which means ‘print 4 spaces’ on my system”, but I thought I’d do another bash example/test/demonstration, and I surprised even myself. On my system, it’s not “print 4 spaces” at all:

$ echo -e "1234567890\r\tABCDEF"
1234ABCDEF

I had expected this to echo

    ABCDEF

But it turns out that the implementation of tab on my system is a bit more complicated than that. Instead it means “indent by one tab width”. If I did:

$ tabs -8
$ echo -e "1234567890\r\tABCDEF"

I’d get:

12345678ABCDEF

And if I do:

$echo -e "\tsomething"
	something

That’s not 4 spaces that it’s printed at the start of the line – try selecting that text – it’s a single tab character, and its width is whatever your tab width is set to (since it’s being displayed on your machine right now).

I think this demonstrates pretty clearly that space is printable and tab is control :)

When fixed-with, monochrome teletypes and terminals were the norm (and for a long time they were the best way for humans to talk to computers – they beat the shit out of punchcards), and the ascii standard was adopted for use on a screen – with generally more capability than a teletype (a screen can easily delete characters / clear itself, and can emulate an infinite roll of paper by scrolling lines), indentation came up. This caused an issue at the time because they didn’t have WYSIWYG word processors with an infinite number of center-aligned tabs that could do everything your typewriter could do. Instead, they had this atomic system – there was no physical way on these devices to have a ‘half-character-width’ tab, like you could on a typewriter. And not a lot of memory or processor power for implementing fancy rules around kiiiiinda-trivial stuff like tabs. So the compromise that was reached was making a tab equal to a certain number of spaces.

But how many spaces? Some said 4, I think some said 8, and some said 2. This is what the ‘tab width’ setting of your text editor means. I’m sure others did more complex things with tab, like “indent to the same column as the next word from the line above”.

I’m not sure where the convention of “a tab equals 4 spaces” came from, but that’s certainly the one that became dominant at some point. Maybe it’s standardised somewhere, maybe it’s just a popular convention.

The point is, the way that tabs was handled used to differ at one point between different terminal hardware and/or settings. This is why tab settings are so seemingly-complicated in plaintext editors today – Similarly to why ASCII has so many control characters, terminal emulators wanted to be able to emulate multiple types of terminal, so the tab settings had to be a superset of all of them.

The practical upshot of all this means that by correctly using your IDE’s “Tab width” setting, if you use tabs for indentation, you don’t need to have this argument about whether a tab should be 2 or 4 or 8 or 32 spaces: You simply set the tab width to your preference and tell your IDE to use tabs for indentation, and you’re set, and can see it indented however you like, and so can everybody else. We can all just use tabs correctly, and live in peace and tolerate each other’s preferences for indenting.

(The correct IDE settings are: Tab width: whatever you prefer; Use tabs for indentation, never spaces; aggressively and automatically convert groups of spaces *at the start of the line* into tabs. Auto-indent. If your editor can’t do these things, you should use a better one. Scite and Geany are good).

And there are valid preferences, too – I personally use 4 spaces indents on a desktop or laptop machine where characters are small and screen real estate is cheap, but if you’re coding on a small form-factor device with a small screen that can’t display long lines easily and large enough to be readable (like my openpandora), an indent of 2 characters is much more workable.

Another still valid though less-relevant-today reason to have a preference about tab width is something i only touched on very briefly earlier – some of these fixed-width displays were 40 columns, and some were 80 columns. The most common 40 column displays you would see were on the 8-bit microcomputers of the 80s, which tended to be built to hook up to TVs via an RF modulator, typically leading to insufficient resolution to do 80 columns and be readable. On a 40 column device there’s a good argument for a smaller indent for the same reason as I have on my openpandora – screen real estate.

So to start summing this all up and getting back to my original point, and although I’ve spent a million words describing the “why it’s more technically and semantically correct”, my #1 argument for tabs is not even based on any principle of it being more technically or semantically correct, or respecting the past, or anything like that.

I argue for tabs over spaces for indentation based on features: Done correctly, it removes the whole “How wide should an indent be?” question and allows users to decide based on their preference while still working together and having consistent code.

But I do also argue for it based on a nerdy “technical correctness” and “compliance with well-reasoned specifications” principles, too: In python, tab is even more explicitly semantically correct – in python we use indentation to signal a block of code to the interpreter. That’s the job of a control character, not of a printable character. That’s exactly what control characters are designed for. Those smart guys back in the 1960s or 1910s or whenever it was knew what they were doing when they put space in there with all the other printable characters.

However, note that when I say that you should be using tabs for indentation, I do not mean they should also be used for formatting – that does cause issues, as many advocates of space have pointed out in the past. I think maybe this is the most common pitfall is that people run into which makes them prefer spaces. But understanding these tab settings is not hard, and there’s a benefit for all users, and it’s the correct option, and also it saves you some space, because one tab character is one quarter the size of 4 space characters!*

(* this old argument for tabs is actually not really true anymore a lot of the time: if you’re transferring this as plaintext over http, you’re probably using a modern web browser which supports http2 and/or gzip compression, and it’s quite likely you’re talking to a server that also supports it, so there’s a very good chance that you’re getting those 4 space characters gzipped, even if you’re not minifying your javascript, and in that case those 4 tabs will take up perhaps 10 or 11 bits of data vs the 8 bits a tab would use )

So, for example:

#!/usr/bin/env python3

def something():
	# this line is indented. You should use a single tab character to indent it.
	#    but if I want to indent this line inside the comment, this is formatting, 
	#    and I shouldn't use tab for that.
	#
	#<-- tab
	#    <-- spaces      
	#
	# so, for example, to make an ascii-art table outlining the characters on this line:
	#    ----
	#
	# it would be:
	#  pos | character
	# -----------------
	#   1  | tab
	#   2  | hash
	#   3  | space
	#   4  | space
	#   5  | space
	#   6  | space
	#   7  | hyphen
	#   8  | hyphen
	#   9  | hyphen
	#   10 | hyphen        # note consistent column widths here, 10 is longer than 9, 
	#                      #   don't use tabs here between the hash and pipe characters

	run_code()

In the code world I've found that this formatting rule boils down to a pretty simple generalisation: left of the comment signifier (the hash character in python), that's indentation, right of it is formatting.

(yes, there are always weird edge cases, like heredocs, where formatting and indentation simply cannot be done well and unambiguously, but I've found this system to work pretty well. In these cases you should do what seems best and cleanest)

And now hopefully you know why tabs are correct and spaces are wrong. Please feel free to disagree and argue that the PEP says so, but just know advance that if you do that you will be wrong.

More seriously, I would welcome discussion over some of the edge cases and pitfalls that people can run into with regard to this stuff. I find that a lot of the issues that people complain about with tabs also occur with spaces. It'd be cool to put together an exhaustive resource on the subject to document what is totally the empirically correct way to do it.

If you made it through this may thousand rambling words over something that many would consider trivial, thanks for reading :)

Command Of The Day

The other day I learned about a new command that I wish I’d known about years ago: mountpoint

I’ve done all kinds of things grepping /proc/mounts (or the output from ‘mount’) in the past to try to determine whether a directory is a mountpoint or not, and there was a simple command for it all along.


$ mount

/dev/sdb2 on / type ext4 (rw,relatime,discard,errors=remount-ro)
/dev/sda2 on /home type ext4 (rw,relatime)
/dev/sdd1 on /media/external type ext4 (rw)

$ mountpoint /home
/home is a mountpoint

$ mountpoint /home/antisol
/home/antisol is not a mountpoint

$ umount /media/external

$ mountpoint /media/external || echo "Dammit"
/media/external is not a mountpoint
Dammit

# A Better Example:
$ mountpoint -q /some/dir || echo -e "\n** Setting up bind mount, sudo password may be required **\n" && sudo mount --bind /src/dir/ /some/dir

News from another century

A long, long time ago – 1999/06/03 – I was brave enough to try (and succeed!) at getting Max Reason’s XBasic running on Linux (Red Hat 5.1, to be precise). I remember thinking it was cool to see my name on someone else’s website when he thanked me. I didn’t even think about it at the time, but this is probably the first time I was able to contribute something back to a free software project.

It seem Max’s site has gone down recently, but here’s the wayback machine link.

(I’d just like to award Max’s parents the “best name evar” award – I think Max Reason even beats out Max Power, particularly for an engineer)

Recursively fixing indentation for a project

An interesting thing happened recently. My team had a discussion about various coding standards in order to come up with company guidelines. We all did a survey indicating our preferences on various questions.

One of the questions which came up was spaces vs tabs.

Now, having done a bunch of work with python in the last decade or so, it has seemed to me that spaces are preferred in the python community by the vast majority of people – projects with correct indentation seem to be few and far between, so I expected this question to be a slam-dunk for spaces.

But it wasn’t. It was split right down the middle. And in the end – tabs won out! :O

Maybe there’s still a fighting chance for doing indentation the right way in the python community?

If you, like me, have been stuck in a codebase with incorrect indentation, I’ve put together the incantation necessary to fix the situation:

find . -name \*.py -exec bash -c 'echo {} && unexpand -t 4 "{}" > "{}-tabs" && mv "{}-tabs" "{}" ' \;

Notes:
* you may want to include more file extensions by doing e.g: find . -name \*.pr -or -name \*.txt -exec blablabla
* You may want to change the -t 4 to another value if your project doesn’t use 4 spaces for its indentation width

Nice work, youtube!

Aaaaaaaaaaaaaaand this is what happens when you have people who don’t understand the technology work on one layer of abstraction, using inefficient frameworks to build things that could be built better with just a little skill and hard work, with no incentive or curiosity to care about any of the thousand other layers of abstraction:

youtube.com is literally more than 50% invalid HTML.

Nice work!

I’m guessing their unit tests don’t include running the output through a HTML validator.

Tips and tricks for registering python plug-ins with gimp – Number 4 will SHOCK you!

This is a write-up of some of the quirks and behaviours I’ve discovered writing Python plugins for Gimp, along with some quick reference material.

I did a bit of searching and didn’t find a good write-up or documentation of the register() method you need to use to register your Python plug-in with Gimp. I figured some stuff out and thought I’d write it down.

Normally, you don’t need much of a reference for gimp’s python library, because it has such wonderful built-in documentation: In Gimp, choose Filters -> Python-Fu -> Console. The Python console will open. Press the “Browse” button and you have a searchable library of gimp functions. If you select a function and press the Apply button, gimp will give you the python incantation on the console command-line, ready to be copy-pasted into your plug-in. This is super helpful and alleviates the need for a (seemingly-nonexistent? I can’t find it online) API reference, but it does have one drawback: It doesn’t give you any examples or tell you a whole lot about the available options. In the case of the register method used to register plug-ins with Gimp, I couldn’t find it in the browser at all.

So, here’s what I’ve learned about registering python plug-ins with Gimp:

  1. There is some documentation If you look around
    It’s not particularly easy, but you can find some documentation out there. Mostly, it’s tutorials on how to write gimp plugins with python. A web search for ‘gimp python plugin’ will give you a bunch.
    I pieced this info together by looking at multiple “how to write a gimp plugin with python” tutorials and examining the difference between their calls to register(), and by trying things out.

    • In the gimp python console, you can use:
      import gimpfu 
      help(gimpfu.register)

      to get a very basic description of the Register method. This gives you back something super useful:

      register(proc_name, blurb, help, author, copyright, date, label, imagetypes, params, results, function, menu=None, domain=None, on_query=None, on_run=None)
          This is called to register a new plug-in.
      
    • A couple of places have a list of available options for the register method:The best documentation I’ve been able to find is now a 404, but is still available thanks to the Internet Archive here. This includes things like helpful list of available parameter types, and lots of useful little notes on behaviour.The Gimp’s Developer wiki has a “Hacking Plugins” page, which doesn’t mention python but which has a few useful links.
    • Here is a table of parameters for the register method, shamelessly copied from this tutorial:
      Parameter Example Description
      proc_name “your_plugin_name” The name of the command that you can call from the command line or from scripting
      blurb “Some Text” Information about the plug-in that displays in the procedure browser
      help “Some Text” Help for the plug-in
      author “Some Person” The plug-in’s author
      copyright “Some Person” The copyright holder for the plug-in (usually the same as the author)
      date “2097″ The copyright date
      label “<Image>/Image/_Do A Thing…” The label that the plug-in uses in the menu. Put an underscore before a letter to set the accelerator key. Use <Image>/ for a plug-in which operates on an open image, or <Toolbox>/ for a plug-in which opens or creates an image.
      imagetypes “RGB*, GRAY*” (see below) The types of images the plug-in is made to handle.
      params [] (See below) The parameters for the plug-in’s method
      results [] The results of the plug-in’s method
      function myplugin The method gimp should call to run your plugin. Not a string.
  2. Making sense of register()’s parameters
    I found myself having trouble with the imagetypes and label parameters. The first few plugins I wrote simply batched up a few gimp operations into one thing, working on an image that I had open.Then, I found myself wanting to write plugins that would perform batch operations, or generate a new image. These worked just fine, but there was one snag: I found that the menu items for my plugins were disabled if I didn’t have an image open. I decided to investigate.I discovered that imagetypes and label work together to control when your menu item is available, and whether your method needs to accept parameters for the currently open image and drawable.

    imagetypes takes a string argument telling gimp what types of images your plugin operates on. The acceptable arguments I’ve found so far are:

    • “RGB*” – if your plugin works on an image and requires colour.
    • “RGB*, GRAY*” – if your plugin also works on grayscale images.
    • “*” seems to be an easier synonym for the above.
    • None – This one is important, and it’s the one I couldn’t find anywhere and found by experimentation. You need to specify None (that is the python NoneType, not the string ‘None’) to have your plugin enabled when you have no image open in gimp, i.e if you’re doing a batch operation on a directory of images, or generating a new image.
    • Maybe “GRAY*” – I haven’t tried this. Does it make sense? RGB has all the grays, too.

    label takes a string argument telling gimp where in the menu your plug-in should go. This has a couple of behaviours and implications that I had to figure out.

    • If your plugin will modify an open image, you should prefix your label with “<Image>/“. So your label might be “<Image>/Filters/Artistic/My _Plugin…”.Importantly, this is what determines whether your method will be passed timg and tdrawable parameters with the currently open image and drawable. So if your label does start with “<Image>/”, your method definition should look like this:
      def myplugin(timg, tdrawable, myfirstparam, myotherparams...):

      If your plugin will open or create image(s) itself (e.g a batch operation or a plugin which creates a new image), you should prefix your label with “<Toolbox>/“. So your label might be “<Toolbox>/File/_Batch/_My Batch Operation…
      If you use “<Toolbox>“, your method definition should not have the timg and tdrawable parameters:

      def myplugin(myfirstparam, myotherparams...):
    • Note the underscores in my examples. These specify the accelerator key gimp will use in the menu. You should set accelerators, they make your stuff easier to use.
    • You can easily create submenus or even new menus “on-the-fly” just by specifying them with a slash. They can also have accelerators. So that label might actually be “<Image>/Filters/My _Menu/My _Plugin” or “<Image>/My _Menu/My _Plugin” to create a “My Menu” menu if you want to.
  3. Here’s the list of data types you can use for plug-in parameters. Gimp will show nice, helpful selectors for them all. Use them!One which I will note is PF_LAYER, which is useful if you want the user to select a specific layer to operate on or work with.
    • PF_INT8
    • PF_INT16
    • PF_INT32
    • PF_INT
    • PF_FLOAT
    • PF_STRING
    • PF_VALUE
    • PF_COLOR
    • PF_COLOUR
    • PF_REGION
    • PF_IMAGE
    • PF_LAYER
    • PF_CHANNEL
    • PF_DRAWABLE
    • PF_TOGGLE
    • PF_BOOL
    • PF_RADIO
    • PF_SLIDER
    • PF_SPINNER
    • PF_ADJUSTMENT
    • PF_FONT
    • PF_FILE
    • PF_BRUSH
    • PF_PATTERN
    • PF_GRADIENT
    • PF_PALETTE

     

  4. Prepare to be shocked: This tip isn’t about registering plugins at all! Gasp. But since we’re talking about batch operations, it’s useful to note that you can easily have your plugin show and update progress bar by using a couple of calls in your loop. There’s also another good practice that you should be aware of if you’re writing a plug-in that’s going to take a while to run: knowing when to update the display.
    • Use gimp.progress_init(“Some Text…”) to set up a progress bar. Do this at the start of your method, duh.
    • Use gimp.progress_update(floatval) in your loop to set progress on the progress bar. floatval should be a float between 0 and 1. You can also call gimp.progress_init(“Your message”) again in your loop to update the text.
    • By default, gimp won’t update its display while your plug-in is running unless you tell it to. So you may want to call gimp.displays_flush() periodically so that the user sees what is going on.
    • But be wary of calling these too often, updating the display is expensive and may slow you down! use something like ‘if count % 5 == 0: gimp.displays_flush()
    • While we’re talking about long-running plugins, it’s not advisable to operate on images on a pixel-by-pixel basis, i.e looping through each pixel in the image, getting an RBG value, doing an operation, and changing a pixel. This is verrrry sloooooow. I assume there’s a faster way, probably retrieving the image as a multidimensional array, working with that, and then writing it back. But I haven’t managed to do that yet. I’ll update this if I do. Mail me if you figure it out!
  5. There Are Still Mysteries!
    Shocking as it is, I’m not omniscient, so I don’t have it all figured out. I haven’t had need of all the available options. I’ve discussed some unknowns already.
    For instance, I don’t know what gimp would do with your return value if you gave it results. That might make for an interesting experiment, and I don’t know what parameter you’d use for filetypes to work on indexed images. I don’t think this presents much of a problem as it’s easy to switch to and from indexed to rgb modes. I would expect that you probably only really want indexed when you’re about to export, unless you’re doing pixel art, in which case I’d recommend checking out something like Aseprite.

So, there’s my wisdoms on that subject. I mostly just wanted to document what I’d learned about writing plugins to generate a new image vs working with an open image, but I find myself searching for gimp-python docs every now and then, so I figured this would be a good thing to write and come back to. I expect I’ll come back and edit it as i learn more. Hopefully somebody else might find it useful too! :)

Bye bye github

Microsoft announces the ruination of github.

Because apparently destroying skype, linkedin, hotmail, etc etc etc wasn’t enough.

I can’t fathom the rationale behind this. Apparently there’s an accounting thing that having lots of users means you’re worth lots of money. So, 7.5 billion.

BUT surely there’s nobody out there who doesn’t think that MS buying github will immediately lead to an exodus of most of its users? As far as I’m concerned it’s a given: MS buys github, github users leave en-masse. I know it’s what I’ll be doing.

So basically MS is buying a website which will no longer have any users for 7.5 billion. Good luck with that.

I’d find it funny if it wasn’t so tragic. I liked github. Just like I liked skype.

PHP: pretty print JSON as coloured HTML

Today I wanted a way to pretty-print a JSON string with colour highlighting. I went looking and found a bunch of ‘pretty print’ functions, but none with colour, so I implemented my own

Usage:

  1. Include the relevant CSS for formatting your prettified JSON. There’s example CSS in the code below. You can do:
    echo("<style>".Convert::jsonPrettyHtmlCSS()."</style>");
  2. Call Convert::json2PrettyHTML(), e.g:
    echo(Convert::json2PrettyHTML('["a",{"foo":"bar","baz":42}]'));

    ….And the code:

    
    <?php
    
    class Convert {
    	
    	/**
    	 * Helper for Convert::prettyJSON()
    	 * Returns a HTML <span> with a class matching the data type (integer,string,double,etc)
    	 * 	Add css to colour the values according to type.
    	 * 
    	 * autodetects numeric strings and treats them as numbers 
    	 * 
    	 * runs htmlentities() and wordwrap() on values (wraps at 100 chars)
    	 * 
    	 * @param mixed $val	value to beautify
    	 * @param int $indents	number of indents
    	 * @param bool $isKey	true if this is a key name
    	 * @return HTML
    	 * @see Convert::prettyJSON()
    	 * @see Convert::json2PrettyHTML() 
    	 * 
    	 */
    	private static function jsonColor($val,$indents=1,$isKey=false) {
    		//echo print_r($val,true) . ": " . gettype($val) . "\n";
    		$type = gettype($val);
    		
    		if (($type == "string") && is_numeric($val)) {
    			//try to convert it to a number
    			$val = floatval($val);
    			
    			if (intval($val) == $val)	//convert from float to int if it's a whole number: 
    				$val = intval($val);
    			
    			$type = gettype($val);
    		}
    		
    		//$type = gettype($val);
    		
    		$color = "";
    		switch($type) {
    			case 'string':
    				$val = '"' . $val . '"';
    				break;
    			case 'array':
    				$val = self::prettyJSON($val,$indents);
    				break;
    		}
    		$val = wordwrap(htmlentities($val),100,"<br />",true);
    		
    		if ($isKey) $type = $type . " key";
    		
    		return "<span class='$type'>" . //"' style='color:$color;'>" 
    			"$val</span>"; // . " (" . gettype($val) . ")";
    	}
    	
    	/**
    	 * Helper for Convert::json2PrettyHtml()
    	 * convert a value (i.e from json_decode) into a pretty colourised string
    	 * @param array|string|number $json		value to prettify
    	 * @param number $indents				indentation level (used for recursion)
    	 * @return string
    	 * @see Convert::json2PrettyHTML()
    	 */
    	private static function prettyJSON($json,$indents = 1) {
    		$ret = "";
    		$indent=str_repeat("<span class='indent'> </span>",$indents);
    		if (is_array($json) || is_object($json) ) {
    			foreach ($json as $k => $v) {
    				$k = htmlentities($k);
    				if (is_array($v) || is_object($v)) {
    					$v = self::prettyJson($v,$indents+1);
    					$ret .= ($ret ? ",<br />\n" : "") . $indent .
    						self::jsonColor($k,$indents,true) . ":\t<br />$v";
    				} else {
    					$ret .= ($ret ? ",<br />\n" : "") . $indent .
    						self::jsonColor($k,$indents,true) . ":\t" . self::jsonColor($v,$indents);
    				}
    			}
    			if (is_object($json)) {
    				$openbrace = "{";
    				$closebrace = "}";
    			} else {
    				$openbrace = "[";
    				$closebrace = "]";
    			}
    			$outdent=str_repeat("<span class='indent'> </span>",$indents-1);
    			$ret = "$outdent$openbrace<br />\n$ret<br />\n$outdent$closebrace";
    		} else
    			$ret = self::jsonColor($json,$indents);
    		
    		return $ret;
    		
    	}
    	
    		/**
    	 * Return or add some CSS for json2PrettyHTML to the requirements
    	 * @param string $return	if true, return the CSS. Otherwise insert it using Requirements::customCSS()
    	 * @return string | void
    	 * @see Convert::json2PrettyHTML()
    	 */
    	public static function jsonPrettyHtmlCSS($return = true) {
    		return 'span.json .integer, span.json .double {
    				color: #700;
    				font-family: mono;
    			}
    			
    			span.json .string {
    				color: #070;
    				font-family: mono;
    			}
    			
    			
    			span.json .key.string {
    				color: #007;
    			}
    			
    			span.json .key.integer, span.json .key.double {
    				color: #707;
    			}
    			
    			
    			span.json .indent {
    				padding-left: 40px;
    			}';
    	}
    	
    	/**
    	 * Converts a JSON string to pretty, readable HTML output which can be 
    	 * 	colourised/customised via CSS
    	 * 
    	 * Also does other nice things, like word wrapping at 100 chars, running 
    	 * 	values through htmlentities(), and treating numeric strings as numbers
    	 * 
    	 * Include CSS to style the output (set colours, indent width, etc)
    	 * Notes: 
    	 * 		- everything will be wrapped in a span.json (i.e <span> with 'json' 
    	 * 			as the class, css: span.json)
    	 * 		- keys will be spans with the'key' class  ( e.g span.key )
    	 * 		- values and keys will be spans and will have the datatype as the 
    	 * 			class ( span.integer, span.key.integer)
    	 * 		- there will be empty spans with the 'indent' class in the 
    	 * 			appropriate places. There may be more than one consecutively. 
    	 * 
    	 * Example CSS is returned by the jsonPrettyHtmlCSS() function
    			
    	 * @param string $json	the json to beautify
    	 * @return HTML
    	 * @see Convert::jsonPrettyHtmlCSS()
    	 */
    	public static function json2PrettyHTML($json) {
    		return "<span class='json'>" . self::prettyJSON(json_decode($json)) . "</span>";
    	}
    }
    
    
    
    

    I hope someone finds this useful! :)

kgrep

Ladies and gentlemen, presenting: kgrep – kill-grep

This is a bash function which allows you to type in a search term and kill matching processes. You will be prompted to kill each matching process for your searchterm.

You can also optionally provide a specific signal to use for the kill commands (default: 15)

Usage: kgrep [<signal>] searchterm

Signal may be -2, -9, or -HUP (this could be generalised but I CBF).

search term is anything grep recognises.

kgrep() {
    #grep for processes and prompt whether they should be killed
    if [ -z "$*" ]; then
        echo "Usage: $0 [-signal] searchterm"
        echo -e "\nSearches for processes matching  and prompts to kill them."
        echo -e "signal may be:\n\t-2\n\t-9\n\t-HUP\n to send a different signal (default: TERM)"
        return 0
    fi  
    SIG="-15"
	#yes, this could be more sophisticated
    if [ "$1" == "-9" ] ||  
        [ "$1" == "-2" ] ||
        [ "$1" == "-HUP" ]; then 
        SIG="$1"
        shift
    fi  
    #we need to unset the field separator if ^C is pressed:
    trap "unset IFS; return 0" KILL
    trap "unset IFS; return 0" QUIT
    trap "unset IFS; return 0" INT 
    trap "unset IFS; return 0" TERM

    IFS=$'\n'
	for l in `ps aux | grep "$*" | grep -v grep `; do
        echo $l
        pid=`echo $l | awk '{print $2}'`
        read -p "Kill $pid (n)? " a
        if [[ "$a" =~ [Yy]([Ee][Ss])? ]]; then
            echo kill $SIG $pid
            kill $SIG $pid
        fi
    done
    unset IFS
}

Click To Print

Here’s a nifty little piece of javascript I whipped up the other day in response to a client request.

With this code (and jquery) on a web page an element on a web page with the “clicktoprint” class becomes clickable. When clicked, it is printed, but only that element. In addition, the element will be scaled to the full width of the page.

I was asked to do this so that a client could have a voucher on their website which you could click on and have it printed. Their previous solution (using a ‘print’ media query in the site css) meant that the rest of the page could never be printed. This code injects a new piece of css for the duration of the special print and removes it afterwards, allowing the rest of the page to be printed by the regular means.

<script type="text/javascript">
jQuery(document).ready(function() {
	jQuery('.clicktoprint').click(function() {
		jQuery(this).parents().each(function(idx,i) {
			jQuery(i).addClass('clicktoprint-parent');
		});
		jQuery('head').append('<style id="clicktoprint-style">@media print { * { display: none; } .clicktoprint, .clicktoprint-parent {display: block !important; width: 100% !important;} }</style>');
		window.print();
		jQuery('style#clicktoprint-style').remove();
		return false;
	});
});
</script>

Dear the entire world

You don’t need to quote database column and table names in queries unless they contain special characters like spaces.

This applies for every database engine and every dialect of SQL I’ve ever used – quoting column names is always optional.

So why the fuck do you insist on writing this in your php codez?

$query=”SELECT \”some_ordinary_column\” from \”some_table\” where \”some_table\”.\”some_column\” = \”some_value\”"

Are you a masochist who loves escaping things or what?

How much more readable is this:
$query=’SELECT some_ordinary_column from some_table where some_table.some_column = “some_value”‘

The funny thing is that the type of people who write this garbage are the same type of people who tell you that using an if statement without braces is “bad style”. lol.

Logging is necessary

Unless you’re me, you’re less awesome than you think you are.

(I’m more awesome than I think I am. This is not a paradox)

Therefore, when you write a mission-critical piece of code, you need a logging system

Your logging system needs to have different types or log message: error and debug at the bare minimum.

Your code needs to log every action it takes.

This might be expensive or difficult. Tough shit. If it’s important, it needs to be logged – you must be able to go back over a particular execution and determine what happened. This is not optional.

This is a good rule even for not-important code. It makes debugging SO much easier. There are approximately 100 billion logging systems available, use a library if you must. Or you could write your own in 10 minutes.

Let’s discuss! Give me an example of a situation where logging is undesirable for important code, and I’ll tell you why you’re wrong… ;)

Converting red-blue anaglyph to stereoscopic images

(EDIT: Updated to add black border between images – makes it easier to see the 3d, and makes the 3d image better defined)

I hate those red-blue anaglyphs. The red and blue fucks with my head – my brain refuses to interpret it properly, and the object does this wierd “flashing” between red and blue.

Plus, I’m too cheap to buy (and too reckless to keep) a pair of those red-blue 3D glasses.

So, I installed Imagemagick and wrote myself a bash function:

stereo_convert () {                                                                 
    in="$1"                                                                   
    out="$2"                                                           
    if [ -z "$in" ] || [ -z "$out" ]; then                                    
        echo -e "\nYou need to supply input and output files!\n"              
        return 42                                                             
    fi                                                                        
    convert \( $in -gravity east -background Black -splice 10x0 -gamma 1,0,0 -modulate 100,0 \) \( $in -gamma 0,1,1 -modulate 100,0 \) +append $out;                  
    echo -e "\nConverted red-blue stereo image '$in' to side-by-side image '$out'.\n"
} 

Here’s a demo image from NASA’s Pathfinder mission.

Input:

Anaglyph image of Pathfinder

Output:

Stereoscopic Pathfinder

Notes:

  • This process removes all colour information, giving you greyscale output. Unfortunately there’s no way to restore full colour to anaglyphs, as the full colour information isn’t there. IMHO greyscale is better than red/blue.
  • The images may not be exactly perfect due to:
    • Red and cyan do not have the same intensity to the human eye – cyan seems brighter, so the right eye may appear slightly lighter. I’ve done my best to eliminate this, but I CBF reading into the science of colour wavelengths etc. right now.
    • Some images may be reversed – it appears that there’s no “hard” convention as to which eye should be red and which should be blue. But it appears that “most” are red==left.

creating a self-extracting bash script

You always see things like vmware and unreal tournament being installed via a self-extracting bash script – It would seem that this is the best way to provide an installer which will work on the widest selection of Linux distributions.

After some googlage, I came up with the following. Given a tarball and an installer script named ‘installer’, it will create a self-extracting bash script:

#!/bin/bash
#############################################################
# Self-extracting bash script creator
# By Dale Maggee
# Public Domain
#############################################################
#
# This script creates a self-extracting bash script
# containing a compressed payload.
# Optionally, it can also have the self-extractor run a
# script after extraction.
#
#############################################################
VERSION='0.1'

output_extract_script() {
#echoes the extraction script which goes at the top of our self-extractor
#arguments:
# $target - suggested destination directory (default: somewhere in /tmp)
# $installer - name of installer script to run after extract
# (if specified, $target is ignored and /tmp is used)

#NOTE: odd things in this function due to heredoc:
# - no indenting
# - things like $ and backticks need to be escaped to get into the destination script

cat <<EndOfHeader
#!/bin/bash
echo "Self-extracting bash script. By Dale Maggee."
target=\`mktemp -d /tmp/XXXXXXXXX\`
echo -n "Extracting to \$target..."

EndOfHeader

#here we put our conditional stuff for the extractor script.
#note: try to keep it minimal (use vars) so as to make it nice and clean.
if [ "$installer" != "" ]; then
#installer specified
echo 'INSTALLER="'$installer'"'
else
if [ "$target" != "" ]; then
echo '(temp dir: '$target')'
fi
fi

cat <<EndOfFooter

#do the extraction...
ARCHIVE=\`awk '/^---BEGIN TGZ DATA---/ {print NR + 1; exit 0; }' \$0\`

tail -n+\$ARCHIVE \$0 | tar xz -C \$target

echo -en ", Done.\nRunning Installer..."

CDIR=\`pwd\`
cd \$target
./installer

echo -en ", Done.\nRemoving temp dir..."
cd \$CDIR
rm -rf \$target
echo -e ", Done!\n\nAll Done!\n"

exit 0
---BEGIN TGZ DATA---
EndOfFooter
}

make_self_extractor() {

echo "Building Self Extractor: $2 from $1."

if [ -f "$3" ]; then
installer="$3"
echo " - Installer script: $installer"
fi

if [ "$4" != "" ]; then
target="$4"
echo " - Default target is: $target"
fi

src="$1"
dest="$2"
#check input...
if [ ! -f "$src" ]; then
echo "source: '$src' does not exist!"
exit 1
fi
if [ -f "$dest" ]; then
echo "'$dest' will be overwritten!"
fi

#ext=`echo $src|awk -F . '{print $NF}'`

#create the extraction script...
output_extract_script > $dest
cat $src >> $dest

chmod a+x $dest

echo "Done! Self-extracting script is: '$dest'"
}

show_usage() {
echo "Usage:"
echo -e "\t$0 src dest installer"

echo -en "\n\n"
}


############
# Main
############

if [ -z "$1" ] || [ -z "$2" ]; then
show_usage
exit 1
else
make_self_extractor $1 $2 $3
fi