- Total page hits/unique visits snippet
5 years 21 weeks ago - Busy IRL, but happier than ever
5 years 22 weeks ago - Drupal page titles like breadcrumbs
5 years 31 weeks ago - Theming the Akismet spam counter
5 years 32 weeks ago - Akismet module v1.1.2 for Drupal 4.7
5 years 32 weeks ago
Web Design and Development
Responsive Images: How they Almost Worked and What We Need
Pricing Strategy for Creatives
A Pixel Identity Crisis
Building Twitter Bootstrap
An Important Time for Design
What I Learned About the Web in 2011
Say No to SOPA
Getting Started with Sass
The ALA 2011 Web Design Survey
Expanding Text Areas Made Elegant
Dark Patterns: Deception vs. Honesty in UI Design
Organizing Mobile
Personality in Design
Some questions about the "blocking" of HTML5
- When people say that the publication of HTML5 "blocked" by Larry Masinter's "formal objection", what exactly do they mean?
- Why does the private w3c-archive mailing list exist? Why can't anyone reveal what happens on there? What are the consequences for doing so? Who gets to be on that list in the first place?
- Can anyone raise a "formal objection"?
- Is anyone calling for the HTML Working Group to be "rechartered"? If so, what does that involve?
- If there are concerns about the inclusion of Canvas 2D in the specification, why were these not resolved earlier?
Some background reading. I was planning to fill in answers as they arrive, but I screwed up the moderation of the comments and got flooded with detailed responses - I strongly recommend reading the comments.
-->WildlifeNearYou: It began on a fort...
Back in October 2008, myself and 11 others set out on the first /dev/fort expedition. The idea was simple: gather a dozen geeks, rent a fort, take food and laptops and see what we could build in a week.
The fort was Fort Clonque on Alderney in the Channel Islands, managed by the Landmark Trust. We spent an incredibly entertaining week there exploring Nazi bunkers, cooking, eating and coding up a storm. It ended up taking slightly longer than a week to finish, but 14 months later the result of our combined efforts can finally be revealed: WildlifeNearYou.com!
WildlifeNearYou is a site for people who like to see animals. Have you ever wanted to know where your nearest Llama is? Search for "llamas near brighton" and you'll see that there's one 18 miles away at Ashdown Forest Llama Farm. Or you can see all the places we know about in France, or all the trips I've been on, or everywhere you can see a Red Panda.
The data comes from user contributions: you can use WildlifeNearYou to track your trips to wildlife places and list the animals that you see there. We can only tell you about animals that someone else has already spotted.
Once you've added some trips, you can import your Flickr photos and match them up with trips and species. We'll be adding a feature in the future that will push machine tags and other metadata back to Flickr for you, if you so choose.
You can read more about WildlifeNearYou on the site's about page and FAQ. Please don't hesitate to send us feedback!
What took so long?So why did it take so long to finally launch it? A whole bunch of reasons. Week long marathon hacking sessions are an amazing way to generate a ton of interesting ideas and build a whole bunch of functionality, but it's very hard to get a single cohesive whole at the end of it. Tying up the loose ends is a pretty big job and is severely hampered by the fort residents returning to their real lives, where hacking for 5 hours straight on a cool easter egg suddenly doesn't seem quite so appealing. We also got stuck in a cycle of "just one more thing". On the fort we didn't have internet access, so internet-dependent features like Freebase integration, Google Maps, Flickr imports and OpenID had to be left until later ("they'll only take a few hours" no longer works once you're off /dev/fort time).
The biggest problem though was perfectionism. The longer a side-project drags on for, the more important it feels to make it "just perfect" before releasing it to the world. Finally, on New Year's Day, Nat and I decided we had had enough. Our resolution was to "ship the thing within a week, no matter what state it's in". We're a few days late, but it's finally live.
WildlifeNearYou is by far the most fun website I've ever worked on. To all twelve of my intrepid fort companions: congratulations - we made a thing!
-->Why I like Redis
I've been getting a lot of useful work done with Redis recently.
Redis is typically categorised as yet another of those new-fangled NoSQL key/value stores, but if you look closer it actually has some pretty unique characteristics. It makes more sense to describe it as a "data structure server" - it provides a network service that exposes persistent storage and operations over dictionaries, lists, sets and string values. Think memcached but with list and set operations and persistence-to-disk.
It's also incredibly easy to set up, ridiculously fast (30,000 read or writes a second on my laptop with the default configuration) and has an interesting approach to persistence. Redis runs in memory, but syncs to disk every Y seconds or after every X operations. Sounds risky, but it supports replication out of the box so if you're worried about losing data should a server fail you can always ensure you have a replicated copy to hand. I wouldn't trust my only copy of critical data to it, but there are plenty of other cases for which it is really well suited.
I'm currently not using it for data storage at all - instead, I use it as a tool for processing data using the interactive Python interpreter.
I'm a huge fan of REPLs. When programming Python, I spend most of my time in an IPython prompt. With JavaScript, I use the Firebug console. I experiment with APIs, get something working and paste it over in to a text editor. For some one-off data transformation problems I never save any code at all - I run a couple of list comprehensions, dump the results out as JSON or CSV and leave it at that.
Redis is an excellent complement to this kind of programming. I can run a long running batch job in one Python interpreter (say loading a few million lines of CSV in to a Redis key/value lookup table) and run another interpreter to play with the data that's already been collected, even as the first process is streaming data in. I can quit and restart my interpreters without losing any data. And because Redis semantics map closely to Python native data types, I don't have to think for more than a few seconds about how I'm going to represent my data.
Here's a 30 second guide to getting started with Redis:
$ wget http://redis.googlecode.com/files/redis-1.01.tar.gz $ tar -xzf redis-1.01.tar.gz $ cd redis-1.01 $ make $ ./redis-serverAnd that's it - you now have a Redis server running on port 6379. No need even for a ./configure or make install. You can run ./redis-benchmark in that directory to exercise it a bit.
Let's try it out from Python. In a separate terminal:
$ cd redis-1.01/client-libraries/python/ $ python >>> import redis >>> r = redis.Redis() >>> r.info() {u'total_connections_received': 1, ... } >>> r.keys('*') # Show all keys in the database [] >>> r.set('key-1', 'Value 1') 'OK' >>> r.keys('*') [u'key-1'] >>> r.get('key-1') u'Value 1'Now let's try something a bit more interesting:
>>> r.push('log', 'Log message 1', tail=True) >>> r.push('log', 'Log message 2', tail=True) >>> r.push('log', 'Log message 3', tail=True) >>> r.lrange('log', 0, 100) [u'Log message 3', u'Log message 2', u'Log message 1'] >>> r.push('log', 'Log message 4', tail=True) >>> r.push('log', 'Log message 5', tail=True) >>> r.push('log', 'Log message 6', tail=True) >>> r.ltrim('log', 0, 2) >>> r.lrange('log', 0, 100) [u'Log message 6', u'Log message 5', u'Log message 4']That's a simple capped log implementation (similar to a MongoDB capped collection) - push items on to the tail of a 'log' key and use ltrim to only retain the last X items. You could use this to keep track of what a system is doing right now without having to worry about storing ever increasing amounts of logging information.
See the documentation for a full list of Redis commands. I'm particularly excited about the RANDOMKEY and new SRANDMEMBER commands (git trunk only at the moment), which help address the common challenge of picking a random item without ORDER BY RAND() clobbering your relational database. In a beautiful example of open source support in action, I requested SRANDMEMBER on Twitter yesterday and antirez committed just 12 hours later.
I used Redis this week to help create heat maps of the BNP's membership list for the Guardian. I had the leaked spreadsheet of the BNP member details and a (licensed) CSV file mapping 1.6 million postcodes to their corresponding parliamentary constituencies. I loaded the CSV file in to Redis, then looped through the 12,000 postcodes from the membership and looked them up in turn, accumulating counts for each constituency. It took a couple of minutes to load the constituency data and a few seconds to run and accumulate the postcode counts. In the end, it probably involved less than 20 lines of actual Python code.
A much more interesting example of an application built on Redis is Hurl, a tool for debugging HTTP requests built in 48 hours by Leah Culver and Chris Wanstrath. The code is now open source, and Chris talks a bit more about the implementation (in particular their use of sort in Redis) on his blog. Redis also gets a mention in Tom Preston-Werner's epic writeup of the new scalable architecture behind GitHub.
-->This shouldn't be the image of Hack Day
I love hack days. I was working in the vicinity of Chad Dickerson when he organised the first internal Yahoo! Hack Day back in 2005, and I've since participated in hack day events at Yahoo!, Global Radio and the Guardian. I've also been to every one of Yahoo!'s Open Hack Day events in London. They're fantastic, and the team that organises them should be applauded.
As such, I care a great deal about the image of hack day - and the videos that emerged from last weekend's Taiwan Hack Day are hugely disappointing.
(These are still images from the video - the original has been taken down).
Seriously, what the hell?
I've heard arguments that this kind of thing is culturally acceptable in Taiwan - in fact it may even be expected for technology events, though I'd love to hear further confirmation. I don't care. The technology industry has a serious, widely recognised problem attracting female talent. The ratio of male to female attendants at most conferences I attend is embarassing - An Event Apart last week in Chicago was a notable and commendable exception.
Our industry is still young. If we want an all-encompassing technology scene, we need to actively work to cultivate an inclusive environment. This means a zero tolerance approach to this kind of entertainment. Booth babes, tequila girls, and scantily clad gyrating women simply set the wrong tone, here or abroad. Heck, this isn't just about offending women - many guy geeks I know would be mortified by this kind of thing.
Hack days are a celebration of ingenuity and creativity. Past US hack days have featured performances from Beck and Girl Talk, both of whom embody the creative spirit of the event. Sexy dancing girls? Not so much.
I'm not the only one who's disappointed.
@Yahoo, for shame : http://flic.kr/p/78btX1 I'm frankly disgusted.
i am *so* disappointed: http://flic.kr/p/78btX1. remember, a team of women delivered the winning hack at the 1st one:http://bit.ly/FokfF
There was a flurry of activity about this on Twitter yesterday. I sat on this entry for most of today, partly because writing this kind of thing is really hard but also because I was hoping someone at Yahoo! would wake up and release some kind of statement. So far, nothing.
Update (1:30am): Chris Yeh of YDN has responded with an appropriately worded apology.
-->Django ponies: Proposals for Django 1.2
I've decided to step up my involvement in Django development in the run-up to Django 1.2, so I'm currently going through several years worth of accumulated pony requests figuring out which ones are worth advocating for. I'm also ensuring I have the code to back them up - my innocent AutoEscaping proposal a few years ago resulted in an enormous amount of work by Malcolm and I don't think he'd appreciate a repeat performance.
I'm not a big fan of branches when it comes to exploratory development - they're fine for doing the final implementation once an approach has been agreed, but I don't think they are a very effective way of discussing proposals. I'd much rather see working code in a separate application - that way I can try it out with an existing project without needing to switch to a new Django branch. Keeping code out of a branch also means people can start using it for real development work, making the API much easier to evaluate. Most of my proposals here have accompanying applications on GitHub.
I've recently got in to the habit of including an "examples" directory with each of my experimental applications. This is a full Django project (with settings.py, urls.py and manage.py files) which serves two purposes. Firstly, it allows developers to run the application's unit tests without needing to install it in to their own pre-configured project, simply by changing in to the examples directory and running ./manage.py test. Secondly, it gives me somewhere to put demonstration code that can be viewed in a browser using the runserver command - a further way of making the code easier to evaluate. django-safeform is a good example of this pattern.
Here's my current list of ponies, in rough order of priority.
Signing and signed cookiesSigning strings to ensure they have not yet been tampered with is a crucial technique in web application security. As with all cryptography, it's also surprisingly difficult to do correctly. A vulnerability in the signing implementation used to protect the Flickr API was revealed just today.
One of the many uses of signed strings is to implement signed cookies. Signed cookies are fantastically powerful - they allow you to send cookies safe in the knowledge that your user will not be able to alter them without you knowing. This dramatically reduces the need for sessions - most web apps use sessions for security rather than for storing large amounts of data, so moving that "logged in user ID" value to a signed cookie eliminates the need for session storage entirely, saving a round-trip to persistent storage on every request.
This has particularly useful implications for scaling - you can push your shared secret out to all of your front end web servers and scale horizontally, with no need for shared session storage just to handle simple authentication and "You are logged in as X" messages.
The latest version of my django-openid library uses signed cookies to store the OpenID you log in with, removing the need to configure Django's session storage. I've extracted that code in to django-signed, which I hope to evolve in to something suitable for inclusion in django.utils.
Please note that django-signed has not yet been vetted by cryptography specialists, something I plan to fix before proposing it for final inclusion in core.
- django-signed on GitHub
- Details of the Signing proposal on the Django wiki
- Signing discussion on the django-developers mailing list
This is mainly Luke Plant's pony, but I'm very keen to see it happen. Django has shipped with CSRF protection for more than three years now, but the approach (using middleware to rewrite form HTML) is relatively crude and, crucially, the protection isn't turned on by default. Hint: if you aren't 100% positive you are protected against CSRF, you should probably go and turn it on.
Luke's approach is an iterative improvement - a template tag (with a dependency on RequestContext) is used to output the hidden CSRF field, with middleware used to set the cookie and perform the extra validation. I experimented at length with an alternative solution based around extending Django's form framework to treat CSRF as just another aspect of validation - you can see the result in my django-safeform project. My approach avoids middleware and template tags in favour of a view decorator to set the cookie and a class decorator to add a CSRF check to the form itself.
While my approach works, the effort involved in upgrading existing code to it is substantial, compared to a much easier upgrade path for Luke's middleware + template tag approach. The biggest advantage of safeform is that it allows CSRF failure messages to be shown inline on the form, without losing the user's submission - the middleware check means showing errors as a full page without redisplaying the form. It looks like it should be possible to bring that aspect of safeform back to the middleware approach, and I plan to put together a patch for that over the next few days.
- Luke's CSRF branch on bitbucket
- My django-safeform on GitHub
- Details of the CSRF proposal on the Django wiki
- CSRF discussion on the django-developers mailing list
This is a major pet peeve of mine. Django's form framework is excellent - one of the best features of the framework. There's just one thing that bugs me about it - it outputs full form widgets (for input, select and the like) so that it can include the previous value when redisplaying a form during validation, but it does so using XHTML syntax.
I have a strong preference for an HTML 4.01 strict doctype, and all those <self-closing-tags /> have been niggling away at me for literally years. Django bills itself as a framework for "perfectionists with deadlines", so I feel justified in getting wound up out of proportion over this one.
A year ago I started experimenting with a solution, and came up with django-html. It introduces two new Django template tags - {% doctype %} and {% field %}. The doctype tag serves two purposes - it outputs a particular doctype (saving you from having to remember the syntax) and it records that doctype in Django's template context object. The field tag is then used to output form fields, but crucially it gets to take the current doctype in to account.
The field tag can also be used to add extra HTML attributes to form widgets from within the template itself, solving another small frustration about the existing form library. The README describes the new tags in detail.
The way the tags work is currently a bit of a hack - if merged in to Django core they could be more cleanly implemented by refactoring the form library slightly. This refactoring is currently being discussed on the mailing list.
- django-html on GitHub
- Improved HTML discussion on the django-developers mailing list
This is the only proposal for which I don't yet have any code. I want to add official support for Python's standard logging framework to Django. It's possible to use this at the moment (I've done so on several projects) but it's not at all clear what the best way of doing so is, and Django doesn't use it internally at all. I posted a full argument in favour of logging to the mailing list, but my favourite argument is this one:
Built-in support for logging reflects a growing reality of modern Web development: more and more sites have interfaces with external web service APIs, meaning there are plenty of things that could go wrong that are outside the control of the developer. Failing gracefully and logging what happened is the best way to deal with 3rd party problems - much better than throwing a 500 and leaving no record of what went wrong.
I'm not actively pursuing this one yet, but I'm very interesting in hearing people's opinions on the best way to configure and use the Python logging module in production.
A replacement for get_absolute_url()Django has a loose convention of encouraging people to add a get_absolute_url method to their models that returns that object's URL. It's a controversial feature - for one thing, it's a bit of a layering violation since URL logic is meant to live in the urls.py file. It's incredibly convenient though, and since it's good web citizenship for everything to have one and only one URL I think there's a pretty good argument for keeping it.
The problem is, the name sucks. I first took a look at this in the last few weeks before the release of Django 1.0 - what started as a quick proposal to come up with a better name before we were stuck with it quickly descended in to a quagmire as I realised quite how broken get_absolute_url() is. The short version: in some cases it means "get a relative URL starting with /", in other cases it means "get a full URL starting with http://" and the name doesn't accurately describe either.
A full write-up of my investigation is available on the Wiki. My proposed solution was to replace it with two complementary methods - get_url() and get_url_path() - with the user implementing one hence allowing the other one to be automatically derived. My django-urls project illustrates the concept via a model mixin class. A year on I still think it's quite a neat idea, though as far as I can tell no one has ever actually used it.
- ReplacingGetAbsoluteUrl on the wiki
- django-urls on GitHub
- Recent get_absolute_url discussion on the django-developers mailing list
Comments on this post are open, but if you have anything to say about any of the individual proposals it would be much more useful if you posted it to the relevant mailing list thread.
-->Hack Day tools for non-developers
We're about to run our second internal hack day at the Guardian. The first was an enormous amount of fun and the second one looks set to be even more productive.
There's only one rule at hack day: build something you can demonstrate at the end of the event (Powerpoint slides don't count). Importantly though, our hack days are not restricted to just our development team: anyone from the technology department can get involved, and we extend the invitation to other parts of the organisation as well. At the Guardian, this includes journalists.
For our first hack day, I put together a list of "tools for non-developers" - sites, services and software that could be used for hacking without programming knowledge as a pre-requisite. I'm now updating that list with recommendations from elsewhere. Here's the list so far:
FreebaseOriginally a kind of structured version of Wikipedia, Freebase changed its focus last year towards being a "social database about things you know and love". In other words, it's the most powerful OCD-enabler in the history of the world. Create your own "Base" on any subject you like, set up your own types and start gathering together topics from the millions already available in Freebase - or add your own. Examples include the Battlestar Galactica base, the Tall Ships base and the fabulous Database base. If you are a developer the tools in the Make Things with Freebase section are top notch.
Dabble DBDabble is a weird combination of a spreadsheet, an online database and a set of visualisation tools. Watch the 8 minute demo to get an idea of how powerful this is - you can start off by loading in an existing spreadsheet and take it from there. You'll need to sign up for the free 30 day trial.
Google DocsYou can always build a hack in Excel, but Google Spreadsheets is surprisingly powerful and means that you can collaborate with others on your hack (including developers, who can use the Google Docs API to get at the data in your spreadsheet). Check out the following tutorials, which describe ways of using Google Spreadsheets to scrape in data from other webpages and output it in interesting formats:
- Data Scraping Wikipedia with Google Spreadsheets
- Calling Amazon Associates/Ecommerce Web Services from a Google Spreadsheet
There's also a simple way to create a form that submits data in to a Google Spreadsheet.
Yahoo! PipesVisual tools for combining, filtering and modifying RSS feeds. Combine with the large number of full-content feeds on guardian.co.uk for all sorts of interesting possibilities. Here's a tutorial that incorporates Google Docs as well.
Google My MapsGoogle provide a really neat interface for adding your own points, lines and areas to a Google Map. Outputs KML, a handy file format for carting geographic data around between different tools.
If you already have a KML or GeoRSS feed URL from somewhere (e.g. the output of a Yahoo! Pipe), you can paste it directly in to the Google Maps search box to see the points rendered on a map.
Google SketchUpA simple to use 3D drawing package that lets you create 3D models of real-world buildings and then import them in to Google Earth.
OpenStreetMapTry your hand at some open source cartography on OpenStreetMap, the geographic world's answer to Wikipedia. If you have the equipment you can contribute GPS traces, otherwise there's a clever online editor that will let you trace out roads from satellite photos - or you could just make sure your favourite pub is included on the map. The export tools can provide vector or static maps, and if you export as SVG you can further edit your map in Illustrator or Inkscape.
CloudMade MapsCommercial tools built on top of OpenStreetMap, the most exciting of which allows you to create your own map theme by setting your preferred colours and line widths for various types of map feature.
Many EyesIBM Research's suite of data visualisation tools, with a wiki-style collaboration platform for publishing data and creating visualisations.
DapperDapper provides a powerful tool for screen scraping websites, without needing to write any code. Output formats include RSS, iCalendar and Google Maps.
TiddlyWikiTiddlyWiki is a complete wiki in a single HTML file, which you can save locally and use as a notebook, collaboration tool and much more. There's a large ecosystem of plugins and macros which can be used to extend it with new features - see TiddlyVault for an index.
WolframAlphaThe "computational knowledge engine" with the hubristic search-based interface, potentially useful as a source of data and a tool for processing and visualising that data.
TumblrUseful as both an input and an output for feeds processed using other tools, and with a smart bookmarklet for collecting bits and pieces from around the web.
The UCSB Toy ChestAn outstanding list of tools that people "without programming skills (but with basic computer and Internet literacy) can use to create interesting projects", compiled by the English department at UC Santa Barbara.
Your help neededThere must be dozens, if not hundreds of useful tools missing from the above. Tell me in the comments and I'll add them to the list.
-->Teaching users to be secure is a shared responsibility
Ryan Janssen: Why an OAuth iframe is a Great Idea.
The reason the OAuth community prefers that we open up a new window is that if you look at the URL in the window (the place you type in a site’s name), you would see that it says www.netflix.com* and know that you are giving your credentials to Netflix.
Or would you? I would! Other technologists would! But would you? Would you even notice? If you noticed would you care? The answer for the VAST majority of the world is of course, no. In fact to an average person, getting taken to an ENTIRELY other site with some weird little dialog floating in a big page is EXTREMELY suspicious. The real site you are trusting to do the right thing is SetJam (not weird pop-up window site).
I posted a reply comment on that post, but I'll replicate it in full here:
Please, please don't do this.
As web developers we have a shared responsibility to help our users stay safe on the internet. This is becoming ever more important as people move more of their lives online.
It's an almost sisyphean task. If you want to avoid online fraud, you need to understand an enormous stack of technologies: browsers, web pages, links, URLs, DNS, SSL, certificates... I know user education is never the right answer, but in the case of the Web I honestly can't see any other route.
The last thing we need is developers making the problem worse by encouraging unsafe behaviour. That was the whole POINT of OAuth - the password anti-pattern was showing up everywhere, and was causing very real problems. OAuth provides an alternative, but we still have a long way to go convincing users not to hand their password over to any site that asks for it. Still, it's a small victory in a much bigger war.
If developers start showing OAuth in an iframe, that victory was for nothing - we may as well not have bothered. OAuth isn't just a protocol, it's an ambitious attempt to help users understand the importance of protecting their credentials, and the fact that different sites should be granted different permissions with regards to accessing their stuff. This is a difficult but critical lesson for users to learn. The only real hope is if OAuth, implemented correctly, spreads far enough around the Web that people start to understand it and get a feel for how it is meant to work.
By implementing OAuth in an iframe you are completely undermining this effort - and in doing so you're contributing to a tragedy of the commons where selfish behaviour on the behalf of a few causes problems for everyone else. Even worse, if the usability DOES prove to be better (which wouldn't be surprising) you'll be actively encouraging people to implement OAuth in an insecure way - your competitors will hardly want to keep doing things the secure way if you are getting higher conversion rates than they are.
So once again, please don't do this.
I hope my argument is convincing. In case it isn't, I'd strongly suggest that any sites offering OAuth protected APIs add frame-busting JavaScript to their OAuth verification pages. Thankfully, in this case there's a technical option for protecting the commons.
Update: It turns out Netflix already use a frame-busting script on their OAuth authentication page.
-->

Recent comments
5 years 2 weeks ago
5 years 5 weeks ago
5 years 5 weeks ago
5 years 6 weeks ago
5 years 7 weeks ago
5 years 7 weeks ago
5 years 10 weeks ago
5 years 10 weeks ago
5 years 12 weeks ago
5 years 13 weeks ago