Juho Snellman's Weblog

I don't want no 'wantarray'

Posted on 2017-07-18 in Perl

A while back, I got a bug report for json-to-multicsv. The user was getting the following error for any input file, including the one used as an example in the documentation:

    , or } expected while parsing object/hash, at character offset 2 (before "n")

The full facts of the matter were:

The JSON parser was failing on the third character of the file.
That was also the end of the first line in the file. (I.e. the first line of the JSON file contained just the opening bracket).
The user was running it on Windows.
The same input file worked fine for me on Linux.

json-to-multicsv - Convert hierarchical JSON to multiple CSV files

Posted on 2016-01-12 in General, Perl

Introduction

json-to-multicsv is a little program to convert a JSON file to one or more CSV files in a way that preserves the hierarchical structure of nested objects and lists. It's the kind of dime a dozen data munging tool that's too trivial to talk about, but I'll write a bit anyway for a couple of reasons.

The first one is that I spent an hour looking for an existing tool that did this and didn't find one. Lots of converters to other formats, all of which seem to assume the JSON is effectively going to be a list of records, but none that supported arbitrary nesting. Did I just somehow manage to miss all the good ones? Or is this truly something that nobody has ever needed to do?

Second, this is as good an excuse as any to start talking a bit about some patterns in how command line programs get told what to do (I'd use the word "configured", except that's not quite right).

Command languages as game user interfaces

Posted on 2014-12-08 in Games, Perl

In the previous post in this series, I promised to discuss in detail some of the positive and negative consequences of the less conventional design choices of my online Terra Mystica implementation. If you have no idea of what that is, reading at least the intro of that post might be a good idea. This post will just deal with one design choice, but it's the elephant in the room: the command language.

The canonical internal representation of a game in my TM implementation is as a sequence of rows, each describing a some number of player actions specified in an ad hoc mini language, or administrative commands that change the game setup in some way (for example setting game options, or dropping a player from the game partway through). This is what it might look like:

yetis: action ACT4
cultists: upgrade E6 to TE
cultists: +FAV6
giants: Leech 3 from cultists
giants: pass BON4
yetis: Leech 2 from cultists
cultists: +WATER
dragonlords: Decline 2 from cultists
dragonlords: dig 1. build G6
yetis: send p to EARTH
cultists: action FAV6. +AIR
dragonlords: pass BON7
yetis: upgrade E7 to TE. +FAV11
giants: Leech 3 from yetis
dragonlords: Leech 2 from yetis
cultists: Leech 2 from yetis

That's a short excerpt from the middle of a random game. A full game generally runs for about 400 rows.

A brief history of Online Terra Mystica

Posted on 2014-11-27 in Games, Perl

What's this Online Terra Mystica thing?

For the last couple of years my main hobby hacking project (over a thousand commits, and probably an order of magnitude more time spent on it than all other non-work projects combined) has been an asynchronous multiplayer web implementation of the brilliant board game Terra Mystica (Feuerland Spiele, 2012). At the moment it's roughly 2/3 Perl, 1/3 Javascript, and uses Postgres as the data storage.

It's been a fairly successful project for something that was originally intended as a one-off. The usage statistics at the end of November 2014 are:

Almost 6000 registered users
About 1200 monthly active users (as in playing at least one game; not passive use like looking at the statistics pages).
14000 moves executed on a normal weekday (10000 on weekends)
16500 games either ongoing or finished.
Bi-monthly online TM tournament run by Daniel Åkerlund with 400+ players.
1038 commits as of this writing.

This was not supposed to be a general use program. It was originally a one night hack to help keep track of a hand-moderated play-by-forum game of TM, which was obviously headed for failure due to the massive amount of errors people were making while describing their moves in natural language or when manually tracking their resources in the game.

Feed moving

Posted on 2006-01-04 in Perl, General, Lisp

I've moved my blog away from the University of Helsinki Department of Computer Science servers, where it's been living for almost two years. In anticipation of this moment I originally made all the links to my rss feeds through as forwarding service. When the blog moves, just flip the redirector to point elsewhere, and the users won't notice a thing!

At least that was the plan.

Apparently a lot of people still managed to subscribe with the target URL of the forwarder (http://www.cs.helsinki.fi/u/jesnellm/blog/rss-...), instead of with the forwarder URL (/blog/rss-...), and thus would still be seeing the old feed after the move.

So now I've just set up a HTTP 301 (permanent) redirect on the old location. Smart RSS aggregators are supposed to update the feed URL when seeing a permanent redirect, but judging from the access logs few do this in practice. Instead a 301 is treated the same as a 302 (temporary) redirect.

Which brings me to the actual point: If you want to keep on subscribing to this feed indefinitely, please check that you aren't using the cs.helsinki.fi URL. I'm hoping to graduate this year, which might also imply losing the 301 from the old location to the new one.

While moving servers, I also took the opportunity to redo the blog as a dynamic application (using Araneida) instead of generating static pages. I'll see whether I can procrastinate myself into adding comment support in the future.

Golf - Deroter

Posted on 2005-07-23 in Perl

Long time since the last golf. Inspired by the recent announcement of a Perl Golf book I took part in a Polish golf that was announced on the mailing list.

Given a input string that has been "encrypted" with ROT-n on STDIN and a dictionary of words (sequences of letters A-Za-z, not of \w) in @ARGV the program needs to output to STDOUT the original plaintext. (Formal rules).

My best solution was 62 characters, but I figured out about an hour before the golf ended that it was actually broken, and didn't have time to figure out anything better than the 65.44 below, which is currently good for a second place. The apparent winning solution of 63 doesn't seem to work either, for unrelated reasons. So the explanation might be for the winning entry, or it might not.

#!perl -p0

You know the drill. -p handles reading the input and printing the output. Use -0 to read the input in one go, instead of a line at a time.

INIT{%a=map{pop,1}@ARGV}

In the INIT block, pop all command line parameters to make -p read from STDIN. Use the removed arguments as keys in a hash table for detecting dictionary words. Using the symbol table with something like $$_=1while$_=pop would save a few characters, but that's incorrect since $ARGV is automatically set to '-' on entering the main loop.

$a{$&}||y/B-ZA-Gb-za/A-z/while/\pL+/g

At the start of the main body $_ contains the whole ROT-n text.

On the first iteration /\pL+/g will match the first word (letters only; \pL is essentially [a-zA-Z]). //g works differently in scalar than in list context: it will only match once per call, but the next call will start at the location in the string where the last match ended. If a match was found it returns true, otherwise false.

In the body of the while we first check if the word we matched is in the dictionary. If it isn't (i.e. $a{$&} is untrue) $_ obviously isn't plaintext yet, so we rotate it by one step with y///. This contains the only tricky bits in the program:

Changing $_ causes the scalar //g to be reset, and start matching from the start of the program.
Doing the rotation backwards (A -> Z, B -> A, ..., Z -> Y) instead of the more intuitive direction (A -> B, B -> C, ... Z -> A) allows writing the transliteration in a way that saves one character.

There are six characters ([\+]^_`) between Z and a. By adding six extra characters into the right place on the left side of the transliteration operation (with -G) we can use the range A-z on the right side, instead of specifying separate ranges for upper- and lowercase letters. Compare:

y/A-Za-z/B-ZAb-za/
y/B-ZA-Gb-za/A-z/

FWIW, the 65.48 by Piotr Fusik by far the coolest solution. Wish I'd thought of that...

Golf - Rush Hour

Posted on 2004-07-01 in Perl

The recent godzillagolf titled Rush Hour was the first golf in a while where my solution contained anything worth explaining. Here's all 157 characters of the solution:

#!perl -n0
sub
R{$b<0?reverse:$_}sub
M{/
/?s^\pL^$b=$#A**pos;push@_,"$& $b
";$c=8*($<
Z);s/$&/ /,s/(($&)\C{$c}) /$1$2/<++${$_=R}or&M
for~~R;pop^ge:exit
print@_}M

The code uses a depth-first search, which can be roughly divided into the following steps (the actual code doesn't do things quite in this order):

If current board has already been visited, backtrack to step 4.
Mark current board as visited
If a car is in the target space, print the moves that have been accumulated and quit.
Move one of the cars one step in either direction
- If no cars can be moved, backtrack
- If backtracking to here, try another car/direction
Go to step 1.

There are several subproblems that need to be solved to implement the algorithm:

Detect whether the win condition has been reached.
Accumulate the moves. (Preferably without using too much memory; I have a 149 character solution that uses 200MB, which isn't really justifyable for this problem).
Iterate over all valid moves (car/direction pairs) for a board.
Given a board and a move, generate another board.
Ensure that the backtracking works.
Detect whether the board has already been visited

The board is of course stored in the original format as a string. There's no room for any fancy datastructures... Given that, here are my solutions to the subproblems:

Just check whether the board contains a space followed by a newline:
```
/ \n/ ? ... : ...  # If the regexp fails, win)
```

Keep the moves stored as strings in @_ in the correct order:

push@_,"$& $b\n"; # $& is the current car, $b is either -1 or 1
...   # execute the rest of the algorithm. This quits on success...
pop;  # ... so reaching this line means that we're backtracking,
      # and need to remove the move

Given a board, iterate over all valid car-characters in the string (I used \pL here instead of the obvious \w for reasons that are still unclear to me). For each character, generate a value $b as either 1 or -1, so that it's guaranteed that for any board both values are generated at least once for each car. Since each line is 9 characters long and there are at least 2 characters in each car, each car must have at least one character in an even and one in an odd position in the string. Hence (-1)**pos generates a proper value.
```
s^\pL^$b=$#A**pos; ... ^ge
```
$#A is just a shorter way of writing (-1). Unfortunately the operator precedence of unary - is smaller than that of **.

First let's solve the problem only for positive values of $b (i.e. down or up).

$c=8*($< Z);  # $c = 8 if car moves up/down, 0 otherwise
s/$&/ /;        # Remove the first character of the car
s/(($&)\C{$c}) /$1$2/
# Find last character of car that's followed by a space exactly $c
# characters from it, and substitute the space with the character.
# For example "bbcc." => "bb.cc". If this substitution fails,
# the move was impossible and we should backtrack.

To handle negative values of $b, just conditionally reverse $_ before and after it's modified (nobody else did this in the golf, which I found suprising):

sub R{$b<0?reverse:$_}
$_=R;...;$_=R

The backtracking can be implemented just by wrapping the code inside a recursive subroutine and restoring the original state if the recursive call returns. There are three interesting bits of state:
- $_. Saved by binding $_ again with for:
```
... for"$_"  # Can't use ... for$_, since that just
               # aliases the current $_ to the new $_
```
  Since $_ needs to be conditionally reversed in d, we can just use the return value of R instead.
```
... for~~R   # ~~ needed to give scalar context to the reverse
```
- The moves that haven't been tried for this board yet. Since these are generated from the substitution in subsolution 3, nothing special needs to be done.
- The accumulated moves. This was handled correctly in subsolution 2
Mark board visited by incrementing a symbolic reference using $_.
```
++$$_
```
Since $_ needs to be conditionally reversed again, the symbolic reference can be made on the value of the assignment instead:
```
++${$_=R}
```
The other part of this subproblem is to not recurse if the board has already been visited. This can be done by comparing the return value of the increment to the final substitution in subsolution 4:
```
s/...//<++${$_=R}or...
```

Mash these ingredients together, and add an exit print@_ to actually do something with the result, and you get the solution shown above.

Golf - Matrix

Posted on 2004-04-13 in Perl

The Matrix golf had a rather thin field (hopefully only temporary), but some really cool code (especially in the post-mortem).

The problem statement was short enough to be quoted here in full:

Let A be an N*N matrix of zeros and ones. A submatrix S of A is any group of contiguous entries that forms a square or a rectangle. Write a program that determines the number of elements of the largest submatrix of ones in A . Largest here is measured by area.

Before going into the details, a brief example of how the algorithm I used works. Assume the following matrix:

The longest string of 1s is the 111 on line 3, so that's the largest submatrix of one line (with an area of 1*3=3). Then transform the matrix by doing a stringwise and on each line and the line that follows it. The last line will be chopped off:

Obviously the only way to get a 1 on the transformed matrix is to have one on the corresponding position in two successive lines in the untransformed matrix. So the string 11 (found on both lines 2 and 3) corresponds to an area of 2*2=4 in the original matrix. Repeat the transform:

00010
00110
00010

Now any 1 is going to be the result of 3 1s on consecutive lines, so 11 on line two means there was a submatrix of area 2*3=6 on the original matrix. Repeating the whole process two more times would result in finding an area of 4 and one of area 5. The answer for this matrix would therefore be 6.

My solution (59 characters):

#!perl -lp0
s/1*/$B[$?*length$&]=$&/ge,/
/,$_&=$'
while++$?;$_=$#B

As usual, we slurp the whole input into $_ with -p0 and take care of the trailing newline with -l. In addition to $_, a couple of other variables contain some interesting state. @B is used for keeping track of the largest area that's been found (an old golf trick; we're only interested in the size of the array, not the values stored in it). $? holds the current iteration (i.e. the multiplier for the area calculation). $? is used since it can only contain an unsigned short (0-65535), and therefore repeatedly incrementing it in the condition of the while results in the variable overflowing to 0 after 65535 iterations. (Another variable with a similar behaviour is $^C, which holds signed chars. I used $? instead since at some point my program couldn't handle negative multipliers)

As mentioned before, the program contains a while-loop whose condition is just incrementing $?. The body of the while implements most of the algorithm. Some code is executed for each string of 1s with s/1*/.../ge. The code in question is $B[$?*length$&]=$&, which just calculates the area of the submatrix that the string of 1s represents (by multiplying $? and the length of the matched substring, i.e. $&), and stores something in that index of @B. In this case, the value being stored is $& since (despite using s///) we don't actually want to modify $_ yet. This takes care of finding the largest area.

To implement the transformation described earlier, a /\n/ is used to find the first newline in $_. After this a stringwise and of $_ and $' will have done the tranformation (including chopping off the last line). Once the loop ends, we just assign $#B (the largest index of @B) to $_, which is then printed out thanks to the -p command line argument.

This was a very cool golf. My only regret is missing a completely obvious optimization of replacing the s/1*//ge with a suitable crafted map, which would've saved two strokes. Well, not completely obvious since I only realized that it would've been possible when writing this post. Perhaps I should start writing these things earlier... ;-)

Golf - Subproduct

Posted on 2004-03-21 in Perl

For some reason I usually don't seem to have time to take part in golfs that I've designed. Almost happened with the badly named (couldn't come up with anything better) Subproduct too, but in the end I decided that I didn't really need to write that seminar report yet...

The problem was simple. Given a string of digits (maximum length 20) and a maximum substring length N (maximum of 9), find the largest product of the digits in a substring of 1..N characters. (For example for the string 0120340 and a N of 5 the correct answer is 3*4=12). The most complex parts about the problem are handling zeros correctly and keeping track of the maximum value encountered (the usual golf idiom of using the length of an array for this doesn't work, since the maximum value of 9**9 would require an array that's too large).

Here's the code (68 characters):

#!perl
$_=shift;s/./$^=1;($^*=chop)<$\or$\="$^
"for($`.$&)x"@ARGV"/ge;print

First we get the first command line parameter into $_with shift, and loop over it using s///. The answer will be saved into $\ so that we can use just a print without any arguments to print it. Of course print without arguments will print $_too, so we need to empty $_ somehow. This is the reason for using s/.//, while the shorter s/// would otherwise also suffice.

For each position in the input string we'll first initialize $^ to one. After that we'll loop N times through a loop, where $_ has been initialized to "$`$&"(that is, all characters up to and including the one that's currently being processed by s///). In the loop, we'll chop off digits from the end of the newly constructed $_ and multiply $^ with them. If $^ is larger then the current value of $\, set $\ to "$^\n".

There are a lot of variations on this theme that are equally long. I ended up submitting one of the more obfuscated ones, only to regret it now :-)

Golf - Card Trick 2

Posted on 2004-03-03 in Perl

After missing one minigolf, I had some time to take part in Card Trick 2.

The mission was to determine the result of the 'trick' outlined below, when getting as input the initial layout of cards and the actions of the 'audience'.

In my hand I have 21 cards that I deal out face up, one each to 3 spread piles, until there are 3 rows of 7. You silently chose your card and inform me of which 'pile' your card is in (1, 2, or 3). I then pick up each pile making sure to put the pile with your card between the two other piles. I deal them out as before, and again you tell me which pile your card is in. We repeat the process a third time, and when I again pick up the piles, placing the pile with your card in the middle, your card will invariably be the center card (11 of 21 in this case).

By the time I'd even read the rules, there was already a mass of people with a solution of 40-41 (and Ton Hospel at 38, but we all know that he isn't human). Given the length of those solutions, it's obvious that the solution is going to involve some cute mathematical formula, instead of directly manipulating the cards to execute the trick.

The way I thought of the formula was this: Given an index I into the set of cards (for example with the 21 cards below, 0-20) and the pick P (1-3) we need a formula for determining from I and P the index which would get translated to I.

Let's determine this by hand for for the interesting elements (we're only interested in cards that are rearranged into the middle third):

    7  8  9 10 11 12 13
 1  0  3  6  9 12 15 18
 2  1  4  7 10 13 16 19
 3  2  5  8 11 14 17 20

Obviously our formula looks like I' = P + 3I - 22. To generalize this we just note that 22 is the number of cards + 1 (which happens to be @F-2). Now, to solve the problem we just need to remember that the card that we're interested in ends up in the middlemost position (i.e. for 21 cards in index 10, @F/2-2), and we can just repeat the formula three times to find out the original position:

  I_0    = @F/2-2
  I_1    = P_3 + 3*I_0 - (@F-2)
         = P_3 + 3*(@F/2-2) - @F + 2
         = P_3 + @F/2 - 4
  I_2    = P_2 + 3*I_1 - (@F-2)
         = P_2 + 3*(P_3 + @F/2 - 4) - @F + 2
         = P_2 + 3*P_3 + @F/2 - 10
  I_3    = P_1 + 3*I_2 - (@F-2)
         = P_1 + 3*P_2 + 9*P_3 + @F/2 - 28

So that's the theory. In practice few Perl tricks are needed for this problem. The cards can be accessed from @F by turning on autosplitting with the -a switch. The picks could also be accessed from @F, but it turns out that it's easier to access them using regexps. By crafting a suitable regular expression, we can get the picks into the special regep variables ($&, $', $1, etc). And finally, by using a repeated substitution, we can use the return value of operation to count the amount of cards in the input (to save one character when compared to using @F/2. My final solution is 39 characters (and a shared second place):

#!perl -lpa
$_=$F[s/ ..(.)//g-27+9*$'+3*$1+$&]