Monday, February 11, 2013

Programming Puzzles

Prepping for programming interviews can be a challenging, especially if you're over worked, and don't have a lot of time to study, because of family commitments.  The following is list of problems I studied and solved recently to get ready for interviews.

Problem 1

1) Implement a breadth first search non-recursively
2) Implement a depth first search non-recursively
3) Implement a breadth first search recursively
4) Implement a depth first search recursively



Currently searching level = 1, index = 0
Currently searching level = 2, index = 1
Currently searching level = 2, index = 2
Currently searching level = 3, index = 3
Currently searching level = 3, index = 4
Currently searching level = 3, index = 5
Currently searching level = 3, index = 6
Found = 7, index = 6, level = 3
Currently searching level = 1, index = 0
Currently searching level = 2, index = 1
Currently searching level = 3, index = 3
Currently searching level = 3, index = 4
Currently searching level = 2, index = 2
Currently searching level = 3, index = 5
Currently searching level = 3, index = 6
Found = 7, index = 6, level = 3
Currently searching level = 1, index = 0
Currently searching level = 2, index = 1
Currently searching level = 3, index = 3
Currently searching level = 3, index = 4
Currently searching level = 2, index = 2
Currently searching level = 3, index = 5
Currently searching level = 3, index = 6
Found = 7, index = 6, level = 3
Currently searching level = 1, index = 0
Currently searching level = 2, index = 1
Currently searching level = 2, index = 2
Currently searching level = 3, index = 3
Currently searching level = 3, index = 4
Currently searching level = 3, index = 5
Currently searching level = 3, index = 6
Found = 7, index = 6, level = 3

Problem 2

Given a binary tree return the level with maximum number of nodes

       / \
      2   3
      /\  /\
     4 5 6 7
    /       \
   8         9 


Question 1

What are the uses of a b-tree, avl-tree and rb-tree?
When would you use a b-tree over an avl-tree?
Which one, out of the balanced BSTs is most the efficient?
Why don't we always use a b-tree?

A b-tree is a generalization of a binary search tree in which nodes can have more than 2 children.  B-trees are used for applications such as databases and file systems.  B-trees are ideals for systems that need to store large blocks of data.  An avl-tree and an rb-tree are two types of automatically balanced tree structures that guarantee O(log n) for all operations (lookup, insertion, removal).

You would use a b-tree when something is disk backed, because b-trees group large blocks of data they can reduce HD seek time to improve performance of retrieving large data blocks from disk.

Efficiency depends on usage.  Avl-trees have stricter (more frequent) rebalancing, but faster retrieval time because of this.  Rb-trees rebalance less often, so they are faster for insertions and deletions, but slower for retrieval.

You could always use a b-tree or b+tree, but avl-trees and rb-tree provide other advantages, such as less tree maintenance, and memory flexibility.  In an avl-tree and rb-tree, nodes in memory can be easily detached from the tree structure, not so in a b-tree.  Avl-trees and rb-trees can be easily converted to linked lists without additional memory usage (which can allow you perform other operations easily with the tree, such as merging).

Problem 3

Crete a method hasUniqueChars(String a), do not use any type of built-in collection.

Problem 4

Convert an infix expression to postfix.

result: success     time: 0.07s    memory: 380160 kB     returned value: 0

1 2 3 + +
1 2 + 3 +
1 2 + 3 4 + + 3 +

Problem 5

Given a Tuple for eg. (a, b, c)..
Output : (*, *, *), (*, *, c), (*, b, *), (*, b, c), (a, *, *), (a, *, c), (a, b, *), (a, b, c) 

The core of the problem involves generating ordered permutations of the indexes
  to replace with '*' ...


Tuesday, August 24, 2010

Convert OSStatus to a human readable string

It's actually fairly easy to get something useful / readable out of an OSStatus code. This is a simple no-compilation Python script to convert an error code to it's description.  It uses "two poorly documented but very usefull functions"GetMacOSStatusErrorString and GetMacOSStatusCommentString.

Example output:

And the script:

Monday, August 16, 2010

Naïve Swype Implementation (& How It Works)

How does Swype work? I don't know for sure, but I came up with a fairly simple approximation of Swype's behavior on Android phones. Which with a little bit of tweaking (maybe some more math and a better set of dictionaries) it could probably get a lot closer to approximating Swype.

This is implemented using Proce55ing which will export Proce55ing code to a Java applet (full source listing).

As for how the implementation works, I started with a word set which holds a collection of words and word prefixes.  The classify operation serves to tell me if the string I am holding is a complete word, a partial word, an incomplete word, or both.

A returned classification is also able to communicate it's frequency (ranking in frequency of usage in common english text).  These frequencies were acquired from Project Gutenburg.  The frequency list is a frequency count from Usenet postings, so it's a bit odd and doesn't include some common words (most notably some connecting words).  The other dictionary is a list of the 3000 most common English words.

Keep this in mind when you play with the sample, the dictionaries aren't perfect.  The naïve implementation along with the poor dictionaries give the demonstration some peculiarities.

Regargless, the populate operation of the word set loads these words, their frequencies, and all prefixes leading up to the word in a dictionary.  For example the word THEY is loaded in as T (stored as incomplete), TH (incomplete), THE (complete & incomplete) and THEY (complete) along with the corresponding word frequencies.

Next is all the connecting glue for Proce55ing. The SwypeState class maintains the keyboard hit collector, the word setand the bits that draw the keyboard and the pen.  The keyboard logic and the keyboard hit collector are fairly boring, but suffice to say, they draw the keyboard and accurately collect what keys where hit when drawing over the keyboard.

Probably the most interesting part of the implementation is figuring out what words where Swyped.  The system passes an event to the key hit collector when a word is finished, this initiates a seach through the word set for potential words.  For a given word like "THEY" the actual keys swiped can vary widely.

Here are some examples (these all map to "THEY"):

  T, Y, H, G, F, E, R, T, Y

  T, G, H, N, C, D, S, E, R, F, C, X, S, W, T, G, V, C, S, E, Y

  T, Y, H, J, K, K, J, H, G, F, D, S, A, Q, W, E, R, T, Y

So, you have some things that don't really look like words at all.  However, the word set loaded with all the word prefixes sorts this out.  It does this using a simple search that slowly builds up a set of prefixes that map to partial words, or complete words (which get recorded), while throwing away things that map to nothing.  As a new letter is examined from the list of hit keys it is added to each of the previously collected prefixes.
In this manner words like OR, OUR, OUT that require one vertical swipe can be detected.  This word overlap presents another problem though: which word has priority?
The first strategy used to resolve overlapping words is a sort based on match length.  With the idea that the longest match is probably the word the user was trying to spell.  For example, a word like THERE will come up with THE as a complete prefix when THERE was then intended swipe because of the swipe length.
The second strategy is where the word frequencies come in to play, they attempt to sort the list of hit words by their frequency.  Words with a higher frequency in the English language should bubble to the top.

  • Better dictionaries: Possibly one that lists all forms of a word with the same frequency (e.g. be, being, been).
  • Fat finger search: extend and trim each swipe path so that less accuracy is needed.  That is, a what's around you operation could be added to each key to get many more paths to search that could lead to the desired word.
  • Spell checking: for people that don't know how to spell, predictive text can be frustrating.  A misspelling dictionary could be really using to help with this.
  • Markov chain: I don't know how effective this would be but since Markov Chains can be used to generate English text it seems like they could be used to select a more likely next word.
The full source code is available at

Wednesday, August 11, 2010

What happens when your email "unsubscribe" button doesn't work...

What happens when your email "unsubscribe" button doesn't work? You get marked as spam. I'm looking at you HP.
"6 months and a million dollars"
...a common saying from a fellow co-worker and previous HP employee... well, 6 months and million dollars later HP, your unsubscribe button doesn't work and I'm about to start clicking the mark as spam button.
I have reservations about clicking this button since it's not technically spam-- and I don't want to pollute Google's cache of data of what's considered spam. Really, I need a "I don't want these emails sent to me anymore, figure out how to get rid of them" button.
Alas, that doesn't exist. Instead I have to resort to click on a damn "unsubscribe" button that's squirreled away at the bottom of the email. Then, two weeks later, I get to figure out that HP really doesn't want to forget about me.
I'm like that one girlfriend you wish you'd never dumped, except, for HP, it's really best if we don't see each other again.
(Maybe I'll go in to my filters and kill file them, but that'll result in even more annoyance.)

Thursday, July 1, 2010

Buzz Bookmarklet

I found it odd that I couldn't easily find a "bookmarlet" for Buzz that I could easily drag into my bookmarks bar.

Maybe I'm blind, or wasn't looking in the right places... so I wrote one. It's probably broken in some cases, and is fairly rudimentary, but here it is:

Post to Buzz (drag to bookmarks bar)

If interested, here's the code:

This is a more fancy (shorter) version that uses querySelector:

Tuesday, February 9, 2010

Silly Space Optimization on Google's Home Page?

According to @chrisskelton and @wkvong, Google leaves off the ending </html> and </body> tags from their home page to optimize for space:

  1. chrisskelton Today I learned that Google excludes the </body> and </html> tags from their main page to save 18 bytes.
  2. wkvong You know Google is crazy because Google's home page doesn't close its <body> or <html> tags for performance
-- this quote was brought to you by quoteurl

Indeed, if you check out the source code for the Google home page it's not there:

To put this in perspective: according to Google gets about 91 million hits per day (in 2006). Assuming all those searches start on the home page (and there's no caching involved), that's:

18 bytes * 91,000,000 hits = 1638000000 bytes
                             1599609.38 kilobytes
                             1562.12    megabytes
                             1.53       gigabytes

If we go by monthly hits:

18 bytes * 2,733,000,000 hits = 49194000000 bytes
                                48041015.63 kilobytes
                                46915.05    megabytes
                                45.82       gigabytes

That's 1.5 gigabytes per-day (or 45.82 gigabytes per-month) that Google doesn't have to send, it doesn't pay for, and consumers don't pay for— all by leaving off a few useless tags. Not really that crazy.

Monday, January 25, 2010

Simple Errno Lookup with Python

Visual Studio has a tool [ERRLOOK] which looks up explanations of error codes. When you don't have the convenience of application code that automatically converts and reports this back to you it's nice to have.

I don't know of a similiar utility on Linux (or other Unix variants), but I know the library calls exist to write it (at least for well known system error codes).

Below is my first pass at writing this utility for Linux / Mac OS X using Python's CTypes.

The benifit of using python is that it doesn't need to be compiled (though this is a small benefit)— it's also an example of using CTypes to do FFI. Python also has the errno module which provides a mapping between a numeric error code and a symbolic name.

The calls to load libc.dylib or might need to be tweaked depending on the system it's running on.

Example output:

[Source for this post in Markdown mark-up]

Friday, January 22, 2010

Using TwitterFeed to send certain bookmarks to Twitter

TwitterFeed is a really cool service. Using Twitter feed you can push any RSS feed to Twitter, which enables a super simple way to push pretty much any kind of syndicated data to Twitter (and now Facebook, Laconica, and HelloTxt).

I also use to save interesting bookmarks in a easily accessable persistent location.

The problem was several fold:

  • Twitter is a great way to talk about things (including links)

  • Twitter isn't so great at categorizing links and making them easy to find later

  • I wanted to save links on but share them on Twitter

  • Not everything I shared on was something I wanted to spam my Twitter followers with.

Turns out the solution is pretty simple. On you can create tags. These tags help oranize bookmarks.

For each tag you can get an RSS feed (see where this is going?). RSS feeds for tags look like this:<username>/<tag-name>

You can also visit the tag's webpage and select the RSS feed button next to the URL bar (then record new link in the URL bar):

I selected the tag name tweet-this for this set-up.

The next step is to visit TwitterFeed and get set-up with an account. Once you've got your account you'll be at a Feed Dashboard, select Create a New Feed and enter the previously recorded URL:

Select test rss feed to make sure everything is working.

The rest is point and shoot with the TwitterFeed set-up process. Now, only bookmarks recorded on with the tweet-this tag will show up on Twitter.

[Source for this post in Markdown mark-up]

Thursday, January 21, 2010

Exponential Visitor Graph (Ebay)

I thought this was fairly entertaining...

I wrote a simple page hit counter for a couple Ebay / Craigslist listing we were doing (using Google's AppEngine, and Yahoo's YUI).

Recently I added a feature to allow it to graph visitor counts over time (basically a cheesy analytics engine).

I guess one of the items was particularly desirable (a very complete, good codition SNES system with a lot of games) because it had what I image is probably a pretty typical graph of visits to an Ebay auction:

It's pretty obvious that the spike (a huge spike) in visitors is when the auction was about to close when everyone was manically hitting reload. Just eyeballing the graph, it looks like an exponential increase.

[Source for this post in Markdown mark-up]

Friday, January 15, 2010

Updated: Token Bucket Downloader

I've updated the rate limiting downloader (that uses token bucket algorithm) to be more user friendly out of the box. It attempts to use urllib2 by default so it can rate limit pretty much any url.

Previously the script required an http proxy and the only means of adjusting operating parameters was global variables in the script. A proxy server is no longer required and all operating parameters can be adjusted via script options.

If an http proxy is selected the python http library (httplib) is used, this is a more rudimentary library so not as many situations are handled. It's possible to install a proxy handler in urllib2 but I didn't do this.

This is more-or-less a "complete" rate limiting download manager. Option output:

   Usage: [options] url

     -h, --help            show this help message and exit
     -f FILE, --file=FILE  output filename
     -d DIR, --dest=DIR    destination directory
     -p SERVER:PORT, --proxy=SERVER:PORT
                           http proxy server
     -z BYTES, --buffer=BYTES
                           buffer size
     -l KBS, --limit=KBS   kbs limit
     -b KBS, --burst=KBS   burst limit
     -t SECONDS, --tick=SECONDS
                           tick interval

Current state of the code.