Tuesday, August 24, 2010
Convert OSStatus to a human readable string
Example output:
And the script:
Monday, August 16, 2010
Naïve Swype Implementation (& How It Works)
This is implemented using Proce55ing which will export Proce55ing code to a Java applet (full source listing).
As for how the implementation works, I started with a word set which holds a collection of words and word prefixes. The classify operation serves to tell me if the string I am holding is a complete word, a partial word, an incomplete word, or both.
A returned classification is also able to communicate it's frequency (ranking in frequency of usage in common english text). These frequencies were acquired from Project Gutenburg. The frequency list is a frequency count from Usenet postings, so it's a bit odd and doesn't include some common words (most notably some connecting words). The other dictionary is a list of the 3000 most common English words.
Keep this in mind when you play with the sample, the dictionaries aren't perfect. The naïve implementation along with the poor dictionaries give the demonstration some peculiarities.
Regargless, the populate operation of the word set loads these words, their frequencies, and all prefixes leading up to the word in a dictionary. For example the word THEY is loaded in as T (stored as incomplete), TH (incomplete), THE (complete & incomplete) and THEY (complete) along with the corresponding word frequencies.
Next is all the connecting glue for Proce55ing. The SwypeState class maintains the keyboard hit collector, the word set, and the bits that draw the keyboard and the pen. The keyboard logic and the keyboard hit collector are fairly boring, but suffice to say, they draw the keyboard and accurately collect what keys where hit when drawing over the keyboard.
Probably the most interesting part of the implementation is figuring out what words where Swyped. The system passes an event to the key hit collector when a word is finished, this initiates a seach through the word set for potential words. For a given word like "THEY" the actual keys swiped can vary widely.
Here are some examples (these all map to "THEY"):
Improvements:
- Better dictionaries: Possibly one that lists all forms of a word with the same frequency (e.g. be, being, been).
- Fat finger search: extend and trim each swipe path so that less accuracy is needed. That is, a what's around you operation could be added to each key to get many more paths to search that could lead to the desired word.
- Spell checking: for people that don't know how to spell, predictive text can be frustrating. A misspelling dictionary could be really using to help with this.
- Markov chain: I don't know how effective this would be but since Markov Chains can be used to generate English text it seems like they could be used to select a more likely next word.
Wednesday, August 11, 2010
What happens when your email "unsubscribe" button doesn't work...
"6 months and a million dollars"...a common saying from a fellow co-worker and previous HP employee... well, 6 months and million dollars later HP, your unsubscribe button doesn't work and I'm about to start clicking the mark as spam button.
Thursday, July 1, 2010
Buzz Bookmarklet
I found it odd that I couldn't easily find a "bookmarlet" for Buzz that I could easily drag into my bookmarks bar.
Maybe I'm blind, or wasn't looking in the right places... so I wrote one. It's probably broken in some cases, and is fairly rudimentary, but here it is:
Post to Buzz (drag to bookmarks bar)
If interested, here's the code:
This is a more fancy (shorter) version that uses querySelector:
Tuesday, February 9, 2010
Silly Space Optimization on Google's Home Page?
According to @chrisskelton and @wkvong, Google leaves off the ending </html> and </body> tags from their home page to optimize for space:
-- this quote was brought to you by quoteurl
chrisskelton Today I learned that Google excludes the </body> and </html> tags from their main page to save 18 bytes. 09 Feb 2010 from web wkvong You know Google is crazy because Google's home page doesn't close its <body> or <html> tags for performance 10 Feb 2010 from web
Indeed, if you check out the source code for the Google home page it's not there:

To put this in perspective: according to SearchEngineWatch.com Google gets about 91 million hits per day (in 2006). Assuming all those searches start on the home page (and there's no caching involved), that's:
18 bytes * 91,000,000 hits = 1638000000 bytes
1599609.38 kilobytes
1562.12 megabytes
1.53 gigabytes
If we go by monthly hits:
18 bytes * 2,733,000,000 hits = 49194000000 bytes
48041015.63 kilobytes
46915.05 megabytes
45.82 gigabytes
That's 1.5 gigabytes per-day (or 45.82 gigabytes per-month) that Google doesn't have to send, it doesn't pay for, and consumers don't pay for— all by leaving off a few useless tags. Not really that crazy.
Monday, January 25, 2010
Simple Errno Lookup with Python
Visual Studio has a tool [ERRLOOK] which looks up explanations of error codes. When you don't have the convenience of application code that automatically converts and reports this back to you it's nice to have.
I don't know of a similiar utility on Linux (or other Unix variants), but I know the library calls exist to write it (at least for well known system error codes).
Below is my first pass at writing this utility for Linux / Mac OS X using Python's CTypes.
The benifit of using python is that it doesn't need to be compiled (though this is a small benefit)— it's also an example of using CTypes to do FFI. Python also has the errno module which provides a mapping between a numeric error code and a symbolic name.
The calls to load libc.dylib or libc.so.1 might need to be tweaked depending on the system it's running on.
Example output:
Friday, January 22, 2010
Using TwitterFeed to send certain del.icio.us bookmarks to Twitter
TwitterFeed is a really cool service. Using Twitter feed you can push any RSS feed to Twitter, which enables a super simple way to push pretty much any kind of syndicated data to Twitter (and now Facebook, Laconica, and HelloTxt).
I also use del.icio.us to save interesting bookmarks in a easily accessable persistent location.
The problem was several fold:
Twitter is a great way to talk about things (including links)
Twitter isn't so great at categorizing links and making them easy to find later
I wanted to save links on del.icio.us but share them on Twitter
Not everything I shared on del.icio.us was something I wanted to spam my Twitter followers with.
Turns out the solution is pretty simple. On del.icio.us you can create tags. These tags help oranize bookmarks.
For each tag you can get an RSS feed (see where this is going?). RSS feeds for tags look like this:
http://feeds.delicious.com/v2/rss/<username>/<tag-name>You can also visit the tag's webpage and select the RSS feed button next to the URL bar (then record new link in the URL bar):
I selected the tag name tweet-this for this set-up.
The next step is to visit TwitterFeed and get set-up with an account. Once you've got your account you'll be at a Feed Dashboard, select Create a New Feed and enter the previously recorded URL:
Select test rss feed to make sure everything is working.
The rest is point and shoot with the TwitterFeed set-up process. Now, only bookmarks recorded on del.icio.us with the tweet-this tag will show up on Twitter.
Thursday, January 21, 2010
Exponential Visitor Graph (Ebay)
I thought this was fairly entertaining...
I wrote a simple page hit counter for a couple Ebay / Craigslist listing we were doing (using Google's AppEngine, and Yahoo's YUI).
Recently I added a feature to allow it to graph visitor counts over time (basically a cheesy analytics engine).
I guess one of the items was particularly desirable (a very complete, good codition SNES system with a lot of games) because it had what I image is probably a pretty typical graph of visits to an Ebay auction:
It's pretty obvious that the spike (a huge spike) in visitors is when the auction was about to close when everyone was manically hitting reload. Just eyeballing the graph, it looks like an exponential increase.
Friday, January 15, 2010
Updated: Token Bucket Downloader
I've updated the rate limiting downloader (that uses token bucket algorithm) to be more user friendly out of the box. It attempts to use urllib2 by default so it can rate limit pretty much any url.
Previously the script required an http proxy and the only means of adjusting operating parameters was global variables in the script. A proxy server is no longer required and all operating parameters can be adjusted via script options.
If an http proxy is selected the python http library (httplib) is used, this is a more rudimentary library so not as many situations are handled. It's possible to install a proxy handler in urllib2 but I didn't do this.
This is more-or-less a "complete" rate limiting download manager. Option output:
Usage: rlfetch.py [options] url
Options:
-h, --help show this help message and exit
-f FILE, --file=FILE output filename
-d DIR, --dest=DIR destination directory
-p SERVER:PORT, --proxy=SERVER:PORT
http proxy server
-z BYTES, --buffer=BYTES
buffer size
-l KBS, --limit=KBS kbs limit
-b KBS, --burst=KBS burst limit
-t SECONDS, --tick=SECONDS
tick interval
Current state of the code.
Monday, January 11, 2010
Basic Token Bucket Rate Limiter
In locations were you have limited internet resources it's sometimes necessary to implement rate limiting. I was curious exactly how this worked so I worked out a simple Token Bucket based rate limiting HTTP downloader.
This Python script does a couple things:
- Limits rate of data consumption in kilobytes per second
- Prints out the instantaneous KB/s and the overal/actual KB/s [this is done by monitoring the file size on disk]
Token bucket is a pretty simple algorithm. The basic algorithm is to create an artificial stream of tokens, which are generated as fast you want to allow the real stream to go. If tokens are not removed from the "bucket" then tokens are only generated up to a "burst limit", which is the maximum amount over the average limit that's desirable (this could change to help trend a stream toward the average limit).
In the Python implementation, 3 threads are used. Thread one monitors the rate of download. Thread two consumes tokens and downloads real data from an HTTP source. Thread three generates tokens and places them in a bucket, stopping when the burst limit has been reached.
The code is available below or at codepad.org.
# -*- python -*- import os import random import time import httplib import urlparse import sys import threading from pprint import pprint ########################### # Tuning knobs BUFSIZE = 8192 BPS_LIMIT = 50 * 1024 BURST_BPS_LIMIT = 70 * 1024 TICK = 0.01 ########################### g_usage = "%s <download directory> <target url>" g_bucketTokens = 0 g_exit = False def parseArgv(argv): if len(argv) < 3: return None directory = argv[1] if not os.path.isdir(directory): return None url = argv[2] s = urlparse.urlsplit(url) _, filename = os.path.split(s.path) return url, os.path.join(directory, filename) def takeTokens(tokencount): global g_bucketTokens if g_bucketTokens >= tokencount: g_bucketTokens -= tokencount return True return False def printKbs(filename): def _(): start, end = None, None tot, num = 0, 0 while True: if g_exit: print "KB/s monitor exiting..." break end = time.time() size = g_byteCount if start is not None: inst = ((size - old) / 1024.0) / (end - start) tot += inst num += 1 print "I: %.02f kb/s, A: %.02f kb/s" % (inst, tot / num) start = time.time() old = size for x in xrange(int(2 / TICK)): if g_exit: break time.sleep(TICK) return _ def feedBucketTokens(): global g_bucketTokens tokens_per = int(BPS_LIMIT * TICK) print "Tokens per tick: %d" % (tokens_per,) while True: if g_exit: print "Token feeder exiting..." break if g_bucketTokens >= BURST_BPS_LIMIT: time.sleep(TICK) continue g_bucketTokens += tokens_per time.sleep(TICK) class BucketReader (object): def __init__(self, fp): self.fp = fp def read(self, bufsize): while True: if takeTokens(bufsize): break time.sleep(TICK) return self.fp.read(bufsize) def prepareFile(filename): fp = open(filename, 'ab+') fp.seek(0, 2) fsize = fp.tell() return fp, fsize def startHttpReq(url, fsize): headers = { "Range" : ("bytes=%d-" % fsize), } pprint(headers) h = httplib.HTTPConnection("aproxyserver", 8080) h.request("GET", url, headers=headers) r = h.getresponse() pprint(r.getheaders()) return r g_byteCount = 0 def readLoop(fpIn, fpOut): global g_byteCount while True: d = fpIn.read(BUFSIZE) fpOut.write(d) fpOut.flush() g_byteCount += len(d) if len(d) < BUFSIZE: break def main(): global g_exit try: params = parseArgv(sys.argv) if params is None: print g_usage % (sys.argv[0],) raise SystemExit(1) url, filename = params threading.Thread(target=feedBucketTokens).start() threading.Thread(target=printKbs(filename)).start() fpOut, fsize = prepareFile(filename) fpInput = startHttpReq(url, fsize) readLoop(BucketReader(fpInput), fpOut) except KeyboardInterrupt: pass finally: g_exit = True if __name__ == '__main__': main() # vim: et:sts=4:ts=4:sw=4:







