Disconnected Mutterings of a Random Geek

Fri, 28 Dec 2007

How Random is xkcd?

I was reading xkcd by clicking on the random link when I noticed that the same cartoons were coming up again and again. I was wondering if this was Confirmation bias on my part or a duff random number generator on the server's part. Randall Munroe is a science geek, and I figured what he would do is test this idea...

One python script and 12,000 (approx) requests later, I had a file full of numbers and started trying to remember some statisics. I threw together a quick bit of python to work out the mean and standard deviation (its here). The mean value is 97.4155602788 and the standard deviation is 171.040683155. If they are uniformly distributed from 1 to 361, the expected mean would be 104.211323761 and the standard deviation would be 181.0. So I was lost in thought for a while.

However, the following quick check threw some light on the difference.

>>> data = [int(x) for x in open('numbers.data')]
>>> for x in xrange(0, 361):
...   if x not in data:
...     print x
... 
0
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360

So comics numbered about 338 don't appear. And recomputing the mean and standard deviation for 1 to 337 gives a mean of 97.2830920561 and a standard deviation of 169.0, which is about the mean and standard deviation the data gives. I'm now waiting for someone who actually knows stats correcting me in where I went wrong.

The conclusion? I suffer from confirmation bias :-(.

For those who like pretty pictures, here is one, courtesy of the Google charts API and pygooglechart:

chart of freq against comic number code here

posted at: 01:04 | path: /computing | permanent link to this entry

Sat, 18 Aug 2007

Firefox blocking

The great thing about the internet is that people can use it to make a fool of themselves in front of the whole world. Since this provides everyone with entertainment, I try and do something silly every once in a while. However, I'm easily topped by other people... Today's morons are these guys who have decided that ad blocking is evil, bad and wrong. Their argument is simple - that not looking at the ads would be stealing. They point out they can't block people who are using the Ad Blocker, so they have to block every firefox user. This is the MPAAs thinking - the user is the enemy. For me, all it means is that I can't visit these websites - I'll have to go spend money somewhere else...

posted at: 05:54 | path: /computing | permanent link to this entry

Mon, 13 Aug 2007

Sorry to be an arse

Steve Kemp recently blogged about wanting a blog compiler that would convert a dir full of text files into a collection of RSS feeds, HTML pages, and any other format under the sun. I use Pyblosxom, which does something similar. Its a clone of a perl based system, bosxom. I don't know if either of these support the full feature set that Steve is looking for, but I do know I haven't plumbed the full depths of what is possible. For example, for the debconf blog, Ganneff had a system where people committed posts as text to svn, which had a commit hook to generate the actual html, rss and atom. That had key-value based metadata, which may have included tagging support.

posted at: 02:43 | path: /computing/debian | permanent link to this entry

Tue, 07 Aug 2007

Playing the system

Neil McGovern recently blogged about the SPI election results - namely the outcomes using different election systems. While it is an interesting exersise, a point to note is that people know the rules for the elction and vote tactically. So to make the assumption that if SPI used a given system, then the outcome would be Neil's results is wrong. This point also applies to discussions of the use of secret ballots - changing this will change the election result.

posted at: 23:40 | path: /computing/debian | permanent link to this entry

Sat, 07 Jul 2007

Graduation

I've finally gotten around to leaving Durham Uni after four years, which isn't even close to the record for a long stay. I've gotten a 2:1 in Compsci - though the cert hasn't been printed yet. Graduation was last week, and unlike most of the UK, was actually dry. Dad's photos

posted at: 16:51 | path: /computing | permanent link to this entry

Fri, 29 Jun 2007

Debconf writeup

If you aren't bored of debconf blog entries yet, then Strib has pointed me at the google summer of code blog bit on dc7.

posted at: 00:54 | path: /computing/debian | permanent link to this entry

Tue, 19 Jun 2007

freduce

Since I'm not at Debconf, I've been filling in time in the early mornings by randomly browsing Facebook. Chris Lamb had a post on a function called fnreduce, which clearly wandered past on planet when I was away. Since I was bored, I wrote a simple haskell version:

fnreduce :: [(a->a)]->a->a
fnreduce [] value = value
fnreduce (a:as) value = freduce as $ a value

Line by line, its clear how it works. The first line is the type sig, which says that the function fnreduce takes a list of functions, and a value of type a, and a gives back a value of type a. The functions all need to be of the type a->a, which means they take a a and give back a a. a is a type varible, which means it can be replaced with any type you feel like, for example: Int or String.

The second line merely defines a base case, which says that if the list of functions is empty, it just returns the value given. The last line does all the work, and says that fnreduce on a non-empty list is computed by taking the first element of the list, computing the new value using it, then calling fnreduce on the rest of the list and the new value. This is how most list processing works in haskell.

I'm left with two quick questions: the first is how to define it in terms of a fold, which it clearly is. The second is how to escape the a->a resriction, which lamby's can do. Since I'm writing haskell, I'ld like to do this in a type safe way, to have my cake and eat it too. This cant be done with haskell's type system, but I may run it past Edwin for his two cents if I remember...

posted at: 09:15 | path: /computing/haskell | permanent link to this entry

Sun, 20 May 2007

Apology - Not being caught up on accomodation email

As you may have noticed if you have mailed accomodation@debconf recently, I haven't replied, and nor has anyone else. There are a couple of reasons for this, the main one being Jon Dowland is talking to the hostels right now about space, and I therefore can not give definitive answers. The other two reasons are I'm sitting my finals and do not have much free time, and that we sent out sponsorship mails to all saying what they had, and as such there is a pile of requests to deal with. If you have mail waiting in my inbox, then don't fret, I'll deal with it on Tuesday.

Quick point - If you want to change your dates and aren't dealing with your own food and accomodation (thats most people), thats fine, but please tell us. You should email accomodation _at_ debconf.org. If you are sponsored, you should mail herb _at_ debconf.org as well. Don't worry, we most probably aren't going to say no, but its nice to be informed.

posted at: 20:45 | path: /computing/debian | permanent link to this entry

Fri, 11 May 2007

Looking for a gallery replacement

Dear Lazyweb, I'm currently looking after The Debconf Gallery and in the polite phrases of Ganneff


10:01 < Ganneff> gallery--
10:01 < Ganneff> i hate this crap php fuck

So I'm looking for suggestions for a Gallery replacement that isn't PHP based. I'm also looking for the moon on a stick. If you have a suggestion for either of these, please direct them to psn at debconf.org . Ta Muchly, Pete

posted at: 12:40 | path: /computing/debian | permanent link to this entry

Tue, 08 May 2007

Wierd stuff

The great thing about the internet, is that no matter how bonkers it gets, there is always someone to take it to the next level. Read all the way down to the bottom...

Thanks to Edwin for showing me that.

posted at: 19:52 | path: /computing | permanent link to this entry

Mon, 07 May 2007

Online Shopping

Dan has been complaining I haven't been ranting enough lately. I have been waiting for the Scottish elections to sort themselves out, because thats a nice mess and easily due some ranting. However, given the need for a placeholder, I wrote this little moan on the subject of internet shopping.

Everyone knows that the future of shopping is online. Every year, more money is spent online. Christmas last year was marked by getting a few parcels a day from amazon. However, I find it fraught with all things of annoying problems. Over the last few days, I have attempted to buy a t-shirt and a bottle of whisky online, while buying a pair of sandals and a book offline. While the wo aren't the same, it was still instructive as to the problems of internet shopping.

Taking money from shoppers. To actually take money from people has to be the most important part of running any sort of shop. However, it is someone hard for me, the shopper, to give websites my money. Consider the example of Geekz Shop. To pay them, it appears I need to have a paypal account[0]. On examining the paypal signup page, I see references to the FSA. While its good that Paypal are actually registered and regulated these days, it does strongly suggest that they are a bank, and as such opening a account with them is the same as opening a bank account with HSBC or someone. In other words, I need to open a new bank account just to buy a t-shirt. Thats a lot of hoops to jump though.

In their defense, they did offer a collection of other ways to pay when emailed about it, but its still a lot more work than walking into a shop and giving them cash. In more general terms, as a shop, its important to make it as easy as possible for your customers to give you money. Its common to outsource this to someone else, but any general solution serving lots of sites will suck[2]. Of course, doing it yourself means you are stuck dealing with credit cards - which is a pain, or BACS which is a joke. Oh well.

The next issue you have is giving people their goods. In a real shop, you hand them their goods and they carry it away (massive generalization). Unless you are selling somthing completely electronic, for example domains, then you have to get the goods to the person. Once again, running your own service is hard - far harder than your own online payment system. However, it and your website is the thing people see of your shop. The classic problem is late delivery. You have to make some sort of claim as to when goods will arrive, but getting it wrong will result in arguments. My best example of this is a christmas present bought off amazon last year - it arrived in January, despite amazon's claim of having it to me before christmas day. The other classic example of getting it wrong is the apple store. It doesn't quote a delivery time - instead it quotes a "ready to ship" time. This is the time needed for apple to put your purchase in a box and stick a label on it. The problem with this is that your poor customer thinks that his new mac is going to show up in a couple of days. It doesn't, so he is massively disapointed. You've disapointed the guy before your product has left your warehouse. Thats one step aay from employing a bouncer to beat your customers when they leave your store.

Having thought that delivery problems are an issue, consider people returning stuff. If its hard, they just moan at their credit card company. If your support and sales folk are in a call centre and can't be reached, then they spend the next week explaining to their mates about how dire you are and why they would sooner deal with the devil than you. Oh, and neither email nor rt is a way to deal with this.

The final problem with most online shops is usability. Usability is hard - full stop. Its harder to sell something when the customer can't pick it up and feel it. While some shops are quite good, the number that force me to think when filling out the checkout forms or looking for products is amazing. Its like they don't want my custom.

From this, I have claimed that a good online store needs a good way to take money off customers, a good way to get goods to customers, a sane way to deal with problems and a usable website (Next week, I'm going to blog about how water is wet). However, lots of sites, including big ones, fail at this. I don't think internet shopping is going away, but I do think that the shops that fail to take money off folk, that have sites that can't be navigated, that give goods to couriers who "lose" them, will go out of business. I hope that most shops steadily improve on these areas, just so I spend less time being pissed off at them

BTW: this post was inspired by my mother's attempts to buy photos of my graduation online (no - it hasn't happened yet). The site in question was apparently so confusing that she has given up and has asked me to buy them. Sorry for ranting.

posted at: 09:05 | path: /computing | permanent link to this entry

Thu, 03 May 2007

Bad Wolf

Since I feel like a meme lemming today: 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0.

posted at: 08:30 | path: /computing | permanent link to this entry

Tue, 24 Apr 2007

Sites and Blogs going downhill

I have been living at home for the past few weeks, and as such have been reading the paper after breakfast. I can do this in Durham too, its just a bit more of a fight to get the paper. After a couple of weeks, I commented to my mother than the columnists seemed to be a bit dire. She pointed out that they all started well, and steadily when't downhill. One would assume that they just ran out of interesting things to say.

I was looking at the Daily WTF today, now renamed to Worse than Failure, and I was reminded of this observation. When I found the site, I did look at the entries and go "WTF?", now I just look and go "meh". I'm guessing that its running out of spectacular material, and that each new posting pushes the boundary of what it takes to be a good posting, or in the case of the Daily WTF, a bad one. At the same time, the number of readers is getting bigger, so the numbers of comments are getting bigger, so there are more silly comments. Add in the bigger readership means more demand for material, and the site's operator starts accepting lesser quality stories and the comments go down hill. Slashdot is the worst example of this, with both dire stories and comments. OSNews has also fallen over this problem.

All in all, this means that as a good blogger / news site, you have to accept the fact that you aren't going to post that often, and accept the fact that you aren't going to get lots of hits and have a steady readership. A few folk, such as Steve Yegge and Joel Spolsky, seem to do this. Of course, writing without readers means writing for yourself, which is the normal advice writers offer on how to get started. So the best way to start is the best way to continue, writing about stuff that interests you and not really caring about having readers.Oh, and incase you are wondering, I don't care about folks reading my blog, and sometimes wish they didn't. They tell me I'm wrong. They get me in the coffee shop and start arguing about stuff I wrote while drunk two weeks ago. anyway...

Getting back to where I started, both newpaper columnists and sites going downhill doesn't really bother me. After a while, they slip into the crap threshold and I stop reading them. Its just sad that they slip away.

PS: Edwin Brady tells the Daily WTF was always quite bad. Maybe its just me who takes ages to pick up on bad sites.

posted at: 15:19 | path: /computing | permanent link to this entry

Mon, 16 Apr 2007

flooding planet

ok, I might have just flooded planet, sorry.

posted at: 22:41 | path: /computing | permanent link to this entry

Debconf Reconfirm

If you are coming to debconf, and if not why not[0], the debconf team would love for you to reconfirm. This both makes our lifes' easier, and you need to do it to get sponsorship, if you have asked for it. To reconfirm, please prod pentabarf. The proper reminder is on debconf announce. Looking forward to seeing you in Edinburgh!

[0] I'm not going to be there for most of it, actually.

posted at: 22:36 | path: /computing/debian | permanent link to this entry

Sat, 14 Apr 2007

Cheap laughs from ars

I'm busy reading this, which outlines a concern that lots of Europeans have, which is how to comply with EU data protection laws and the US court system at the same time. SWIFT appear to be dealing with the problem, which is good. Ars then gives an example of a similar American massive surveillance system, Echelon, and how it feels the US has abused the system to catch Airbus bribing people. Now, I'm naive, but I feel that bribing people is wrong. And while two wrongs don't make a right, I'm not happy with the idea that Europe's main complaint about Echelon is that it was used to catch Europeans breaking their laws. This would be similar to UK drivers complaining about speed cameras by complaining that they catch speeders, rather than them causing accidents or being used as a way to make money. So could ars use a better example of abuse in the future, as I'm sure there must be one?

Sorry for moaning, I'll return you to your normal diet of interesting stuff from DDs

posted at: 00:49 | path: /computing | permanent link to this entry

Tue, 06 Mar 2007

RE: Dublin bound

This post is here becuase Matt Brown got a job with google, so I'll probably end up sitting two desks away from him, or something. His Wordpress instance and the Uni's proxy don't get along...

Snap! I'm looking forward to meeting you, I start in July. Seems the time of year for this sort of thing. The word I have is that I get relocation paid for, but I'm moving me and a laptop over the Irsh sea, rather than a house from New Zealand... Well done!

posted at: 13:58 | path: /computing | permanent link to this entry

Tue, 27 Feb 2007

Off to Ireland

So sometime last year Andrew Stribblehill landed a job with Google. This did quite a number on google's mystique among the Durham spods, but soon he started telling us stories about how nice a place it was to work and how they were looking for more people.

So when I came back for my final year at Uni, I started looking for a job. No idea what its like at any other uni, but around here people do the following when the real world approaches:

I was in the second and fourth groups, and when Strib suggested I apply to Google, I went for it. What followed was four months of phone interviews, emails, some fratic revison and a free trip to Dublin. I can't really say much about the process, but it is everything its made out to be... The long and short of it is that they have offered me a job.

Having accepted, I now have had a major worry lifted from my mind and replaced by smaller worries, like moving to Ireland.

A major thank you is needed, to all the following people:

I'll buy you a beer next time I see you.

Note: I'm unsufferable when something like this happens, so free free to thump me when i get annoying... So far the only downside I can think is I can't tell people to Just fecking google it anymore.

posted at: 18:27 | path: /computing | permanent link to this entry

Thu, 15 Feb 2007

Free Tshirt

Ages ago, I entered the IBM Mainframe contest. As with many things, I never got as far as I would like due to work presure, but today I got a free t-shirt and Certificate from them. Thanks!.

As a random aside, why have tech companies started giving out white t-shirts? I can't really wash them, since I don't have that much white clothing. There is also the problem that the other one I have is from google and has "I'm feeling lucky" on the back, which I regard as the same as giving God the finger...

posted at: 19:01 | path: /computing | permanent link to this entry

Mainframe goodness

Dave and I have entered the IBM mainframe challenge and I'm pleased to say I might have gotten though stage 1. I might get a t-shirt out of it. If I get free time, I'll do stage 2, which dave thinks isn't hard. I quite like the system, and I wonder how many jobs there are in that area.

posted at: 18:55 | path: /computing | permanent link to this entry

Fri, 09 Feb 2007

Hello Planet Debian

Since I'm now on planet debian, I ought to introduce myself. I'm Pete, I go by psn on oftc. I'm a third year Computer Scientist at Durham Uni, I come from Edinburgh, which explains the one debian thing I do right now: Debconf. I'm mainly doing accomadation stuff with Jon and writing minutes. For those of you at Durham, I'm the guy in a debian tshirt sitting at the back of lectures typing on a laptop. No, not Nathan, the other one.

two quick things:

Thanks, Pete

posted at: 08:52 | path: /computing/debian | permanent link to this entry

Thu, 08 Feb 2007

Its snowing

Normally, as a debian person, I watch the bug count and wish it was lower. Everyone does it. However, I'm in a somewhat odd position brought on by the snow outside. My um-friend[0] has her dissertation on creepy-crawlies living in a field next to the botatic gardens here. The snow and cold kills them off. So, I'm a debian guy wishing for a high bug count...

[0]um-friend. If you have to ask, you don't want to know.

posted at: 15:24 | path: /computing/debian | permanent link to this entry

Its that time of the year again...

As is the norm, around this time debian goes crazy and tries to elect a new leader. I was talking to blank about his ill-formed plan to stand, and put to him the three questions that should be put to every candidate:

  1. Are you going to go MIA?
  2. Are you going to start flamewars?
  3. Are you going to be evil?

Correct answers to these are worth discussing...

posted at: 13:32 | path: /computing/debian | permanent link to this entry

Wed, 07 Feb 2007

qmail users.

I get lots of spam, and the normal techniques get rid of it. I also get lots of DSNs, which are a mail server's way of telling me it accepted my mail but can't deliver it for some reason. DSNs are ok, but suffer from the problem that someone can fake addresses and get the DSNs sent to another address. Spammers, for example.

The procmail people like to claim that if a person if filtering spam with procmail, then he/she has already lost. By this they mean that accepting spam at smtp-time is a bad idea. Most systems filter mail at smtp-time, and reject the spam right then. This way, they never take the mail and never have to send a DSN. A simple bit of reasoning might be: Never accept mail unless:

  1. You actually want it
  2. You have somewhere to put it

Qmail users, on the other hand, appear to have a nasty habit of accepting anything then bouncing it back. Thats nice for the warm friendly internet that qmail was written for (joke) but its less good in these days of spammers. So, if you run qmail, or any mail server, reject at smtp time, don't bounce.

The temptation to compose a standard mail saying "I didn't send this mail, stop bouncing like a moron" and binding a key to send it in mutt is great.

posted at: 16:01 | path: /computing | permanent link to this entry

Agent assignment, or a fun way to do load balancing

We have to do an assignment for Multi-agent systems based on finding a problem which can be solved with agents, and solving it. I'm thinking about doing one using load balancing. The basic idea is that you have a cluster of servers providing space to do computations, and a bunch of services. Each service and server has an agent to watch it. When the service needs more power, its agent askes for it. Hopefully the whole system will regulate somehow. It needs more work, but it ought to be fun.

posted at: 15:36 | path: /computing | permanent link to this entry

Debian vs Mac OSX

I own an mac, and its replaced lupin, my debian box, as my everyday work machine, mainly because I can carry it around with me. I generally wonder which I prefer:

Mac Pros and Cons

Debian pros and cons

It seems a waste, but I'll list more when they come to mind.

posted at: 11:52 | path: /computing/debian | permanent link to this entry

No Debian Boxes

I recently put lupin in the bedbox, mainly becuase I wasn't using it and couldn't face the effort of sorting out how to do NAT in iptables. Yes, I know, that is lazy. I was overworked. My alpha is also off the road, due to a problem with the switch in Dan's rack. This leaves me working on my mac and on hades, one of which runs openBSD and the other of which runs OSX. I'm a debian person without a debian box...

On the debian front, I have been mainly spending time doing debconf stuff, sorting out hostels and the like. Its currently slow boring work, but Jon is doing most of it.

posted at: 10:58 | path: /computing/debian | permanent link to this entry

Python Config system

I haven't blogged in a while, due to being buried in work, not really having done anything I think of as cool, and enguaged in some evil plotting. But now I have a problem. I want a really easy way of using config files in python. Quick example:

name = "john doe"
age = 12
sex =  "male"
should give a dict like: config = {("name", "john doe"), ("age", 12), ("sex", "male")}

I haven't found such a bit of code myself, and think it might be a pain in the ass to write it myself, so if you know of such a thing....

posted at: 10:38 | path: /computing | permanent link to this entry

Mon, 15 Jan 2007

Back in Durham.. to a broken interweb

So I'm safely back south of the border, with a ton of work to do. I'm also fighting esol, the uni's way of getting the blog-tubes to our rooms. They want you to run a little program, called CSA, to check that your system is secure and for it to phone home with that infomation. I run it... and it does nothing. Try again, and it does nothing. Pull apart the .app, find the actual program, and run it in the terminal made it spew some interesting output, including that it can't write to "~/tmp". Which is interesting, since I have a folder full of tempory files called... "~/tmp/". renaming tmp lets it work. Why it wants to use my home dir, rather than /tmp, and why it doesn't use tmpfile() is an interesting point. oh well, in these hands dur.ac.uk rests.

On a similar note, I see that dur.ac.uk is giving me a 6to4 ipv6 address, but that address isn't routed anywhere. I like this thinking.

posted at: 14:05 | path: /computing | permanent link to this entry

Mon, 18 Dec 2006

SAT-solving, Codedup

I got around to implementing my Code dup finder, and its on the web at this address. If you are young and trendy, then you can grab if using darcs with a darcs get. It only works for finding repeated lines of code, and misses empty lines. Some of the more advanced ideas haven't been implemented, but it does work.

I also wrote a little SAT solver. Its very much a exponial thing, but it does work for small instances. it employs a really simple huristic of spotting which varible appears the most, and assuming it is true or false. I'm planning on improving the huristic by making it ignore conjucts that have already been satisfied, and by mking the huristic guess if a varible should be true or false.

The other thing I'm thinking about is how to generate instances of SAT automaticly. I have a instance genenator already, but it needs some work to make harder instances.

posted at: 18:28 | path: /computing | permanent link to this entry

Tue, 28 Nov 2006

Ranting about Types

I'm sitting here reading Joel on Software and he is talking about using Hungarian notation, where you encode infomation about the varible in the variable name. For example, an index is coded as starting with id. This code lets a developer check for mistakes visually, which he discusses here. His ASP compiler also uses this infomation. This is all well and good.

There are two things about this that bug me. The first is that having people check your code for errors of that sort seems daft. Why not write a program to do it for you? The second is that the varible names have to have this encoding and you have to write id before every index or rgwch before all your arrays of Unicode.

Lets imagine a world where you ran a program to check that the semantics of your varibles matched. Lets futhermore imagine a world where you didn't need to state in the varible name that the varible held a recordset and instead the system could work it out for you. It would easily fix the two problems people have with Hungarian notation, getting people to be good at it, and the wierd conventions.

Such a world exists. Its called haskell, and its been kicking around for ages. A haskell programmer would never have to read though his code looking for wrong bits, ghc would just poke him in the eye when it didn't type check. Haskell coders have this habit of saying that if the code type checks, then it works. Guys like Conor have taken this to its extremes, and let you specify all your program's behavoir in its types.

The thing that gets me about this that Joel, a very smart person, has reinvented the wheel, and got an octogon. He hasn't looked up any of the real CompSci behind what he is doing. And to paraphase the well-known saying: Those who do not understand CompSci are forced to rediscover it, poorly

posted at: 15:21 | path: /computing | permanent link to this entry

Turning off planet

For a while now, the last four or five months at least, I have been using planetplanet for RSS and Atom aggregation, or in english, reading a bunch of websites on one page. Planet is a bit of a hog, mainly in the disk acresses. This is a bit of a problem on hades, whose disk's aren't wonderful.

Dan has been nagging me to use rawdog for a while, so this morning it floated to the top of the todo list, and stealing his config and one line of shell gave me a sorta working instance with the same feeds:

grep "^\[h" ~/usr/planet/lupin/config.ini | sed "s/\[//" | sed "s/\]//"
| awk '{printf("feed 1h %s\n", $1)}' >> .rawdog/config

I'm going to leave it for a bit to see if I like it, and to sort how I want it laid out, but its now in cronand planet isn't, so the worse abuser of hades should be gone.

posted at: 12:50 | path: /computing | permanent link to this entry

Mon, 27 Nov 2006

SubString matching

As part of my work on Hat, I'm wondering how many lines are duplicated in the source code. So I started thinking about the problem, in a generalized way, of course. My first idea was to consider one file as a possible substring source, meaning that its contents may contain substrings of the other file. Since part of the idea is that you want the largest substrings possible, it makes sense to generate the large substrings, then move on to the size below it, and so on.

Quick aside here, when I was down at Fun in the Afternoon, Wouter Swierstra gave an interesting talk on version control, in which he abstracted away from thinking in Characters, lines, or some higher level construct in the file. I'm borrowing this idea, and am going to talk about "blocks", and having files made up of blocks and looking for sub-blocks of files. This gives three advantages:

Quick note about block composison. A block is ethier a block in its own right, or a list of blocks. Clearly, order matters.

Having started thinking about generating blocks, it is clear the largest block is the whole file, and there is one of them. We can then generate the blocks that are 1 block smaller by removing the first and then the last blocks. So we have 1+2+3+...n-1 subblocks of a file. Which is about n-squared blocks, and assuming that testing a block against a file takes n time, this gives us a n-cubed time for testing two files. This sucks, so a better solution is needed.

Take the smallest blocks from a file. map each of them to where it came from, and feed the result into a table based on the block's contents. Repeat for all the files. When finshed, remove all the entries in the table with less than two file addresses. These are the matches, so then all you need to do is assemble the largest block sizes needed.

Construct a new table based on the addresses of bits of the file. For each pair of matches, look up the next address, if by line, then the next line in the file. if that is a match, then merge the blocks. Output the result.

Running time is based on how good your table is. Assuming a table with insertation and lookup of o(logn), then the running time is O(nlogn). assuming a table with insertion and lookup of O(1), then the running time is O(n), which is about as good as it gets for this sort of problem. However, its easy to construct a running time of O(n^2), by having all the inputs identical. There are a few ways of dealing with this, mainly by making insertion into the hash bucket O(1).

All in all, I'm happy with this, and I'm going to have to implement it for lines tomorrow. Full block implementation is going to wait until I have worked out how to define what a block is at runtime, which might need a parser with the grammar supplied at runtime.

posted at: 10:46 | path: /computing | permanent link to this entry

Fri, 24 Nov 2006

Fan failure

I work up yesterday to find my internet connection had gone away. On checking, I found that the crappy PC running OpenBSD that I use for a router was dead. I assumed it was the power supply, and went and bought a spare between lectures. On fitting it, it still didn't work, meaning I now have one more power supply than I need. and watching closely, I saw the processor fan speeding up and slowing down. I took it out, wandered down to the computer shop and asked for a replacement. They laughed, so I wandered out and asked Dan for his advice. He dunked it in the the engineering magic fluid bath, and its alarmly clean. However, the box still falls over every now and again, so I'm thinking about a new one.

posted at: 17:42 | path: /computing | permanent link to this entry

Mon, 20 Nov 2006

Phone interview

I'm sitting here waiting for the phone to ring, and this model came to mind.

from time import sleep
i=64
tense=0
while(i>0):
    sleep(i)
    i=i/2
    tense+=1
    print "up a notch %s" % tense

posted at: 17:53 | path: /computing | permanent link to this entry

Fun in the Afternoon

Fun in the afternoon is a designed to be a relaxed programming semair, and being bored I went along. It was pretty good, with all the talks being quite relaxed and interesting. Phil Wadler's Talk was of some interest, since I hate PHP and the other web programming things I have tried, including haskell-html and Kaya arent quite ready yet.

The other three talks were quite good. All discussed new things, and there was a lot of "I don't quite know" in reply to the tough questions. The Metatheory stuff was fun, though I haven't done anything like enough type theory to need it. The Version Control stuff was quite cool, and it would be nice to see that become a real tool, similar to darcs. Andrew Kennedy's take on C# was interesting, though he admitted beforehand that he didn't quite belive it himself. It lead to a lot of muttering between Edwin and Myself as to what a functional langauge really was. Its interesting how all the academics revert back to their undergrad days: reading the paper, talking, playing on laptops, being late, sitting at the back and eating.

It was also nice to put some faces to names, including Neil Mitchell and Conor McBride Its also nice to see how many people were there, and that people wanted to go have a drink afterwards. That evening I caught up with Dan, and its nice to see he is settling into his new job well.

I caught the train home in the morning, getting in at three, and managed to watch the rugby before sleeping for 15 hours, a result of not sleeping week for 8 days before that.

posted at: 04:16 | path: /computing | permanent link to this entry

Fri, 10 Nov 2006

Palindromes, revisted

Strib pointed out that my palindrome checking could be better. First off, using all the usernames, and not just the ones with email addresses attached, found two more palindromic usernames. bb which is for big brother, and pgp, which is the user whose $HOME is pgp binaries. Doesn't everyone use gpg yet?

He also thought that my code could be faster. His first effort was 3 times slower, and my attempt to make it non-recursive only gave 5 ms more speed. Then he unleashed perl, and wrote:

time perl -ne 'chomp;print "$_\n" if $_ eq reverse $_' < /tmp/users

Which is an order of magnitude faster, mainly startup time. I then wrote one in C, which used all sorts of tricks to me a bit faster, and takes about 6ms, which is an order of magnitude faster than perl, considering that vega takes 4ms to tell you that a command doesn't exist, it aint getting much faster.

posted at: 10:54 | path: /computing | permanent link to this entry

Observe

Wrote, and type-checked, the back end to Observe. Now I need to tie it to the front end, which might be harder than it looks. I didn't do this last night since it was half two in the morning.

posted at: 10:34 | path: /computing/hat | permanent link to this entry

Stack and Cover

Have shuffled things around so that stack and cover use the new design. Thngs seem to work, though I can see at least one bug in Stack. Its owrth noting that they run in the control thread, rather than getting a thread of their own.

posted at: 10:22 | path: /computing/hat | permanent link to this entry

Tue, 07 Nov 2006

Detect Bugs

I have finally fixed all the bugs in Detect bar one, which means I have finally got a bit of the project working. Yey. I also have a better test case. I'm going to move Hat-Stack and Hat Cover to the new design, then fix Hat-observe. I'll post a screenie at some point.

posted at: 16:32 | path: /computing/hat | permanent link to this entry

Made with PyBlosxom