Think you have finally mastered user interfaces? Ha. Think again. The next big thing is Internet TV, but the difficulty of operating these sites with a handheld remote hampers its adoption. Our job as developers is to modify our UIs to provide access to our features and content, yet also remaining familiar to users accustomed to the simplicity of cable television interfaces. This turns out to be a very hard problem.
Internet television is the next big thing. Already, people are connecting their computers to their televisions with special-purpose devices like the Boxee Box, Logitech's Revue, Roku, Apple TV, and home theater PCs (HTPCs) of all descriptions. These and other devices give you access to the vast quantity of free and pay content available on YouTube, Vimeo, Netflix, Hulu, Vudu, Crackle.com, and thousands of other sites.
If you don't believe that Internet TV is the next big thing, consider the recent report that Netflix now has over 7 million subscribers. That's more than any single cable TV company. A lot of those users are still stuck watching movies on their computers, but a growing number of people are hooking devices to their televisions so that they can watch those movies in comfort.
Today you can buy a 42" LCD TV, an HTPC, and everything required to connect it to the Internet for under $1,000 US. A smaller television and a more limited interface box cuts that cost in half. A Netflix subscription starts at $8 per month. Add Hulu Plus for another $8. For $20 per month, you can get more and better content online than you can from your cable company. The average cable TV bill (excluding broadband Internet access), by the way, is about $72 per month. In less than two years, the savings will pay for that new $1,000 Internet TV setup.
Internet TV is here. It's in its infancy and there are rough edges, but it's working and an increasing number of people are using it.
I find it interesting and somewhat distressing that all of those sites I mentioned above, and every other site I've seen, target the desktop with a traditional Web interface, support mobile devices such as the iPad and iPhone with custom interfaces, and then completely ignore the television platform. I think there are two reasons for this: Many developers haven't figured out that the TV platform is different, and those who have figured that out don't know what to do about it.
The TV is different
The television is a fundamentally different device from a desktop computer, tablet, or other mobile device. Sure, they're all similar in that they have a computer that gets information from Web sites and displays video, but that's about as far as the similarities go. Just as a mobile device has limitations that force its interface to be different from a desktop computer, so does the TV platform – even if your "TV platform" is just an HTPC connected to your new 42" LCD TV.
There are five primary differences that make developing television user interfaces challenging:
- The display is smaller.
- The preferred input device is very limited.
- Platform constraints limit functionality.
- The amount of content is staggering.
- User expectations are radically different.
A matter of perspective
Modern television displays are physically larger than most computer monitors. Most users today have a monitor that's 24" or less in size, but 42" and larger televisions are very common. So how can I say that the display is smaller?
It's a matter of perspective. Computer users commonly set their 24" monitors to a resolution of 1920 x 1200 pixels, giving about 94 pixels per inch on the diagonal. A 42" monitor displaying 1080p video at 1920 x 1080 pixels gives you about 60 pixels per inch. So a YouTube thumbnail image that's 120 x 90 pixels is about an inch tall on the computer monitor and about 1.5 inches tall on the television. So far so good.
But the apparent size depends on how far away your eyes are from the screen. The greater the distance, the smaller it looks. It's a linear relationship; when you double the distance, you halve the apparent size of the object.
The apparent height of an object is called its angular size, and is described in degrees of arc. The larger the number, the larger the object appears. The formula is:
Angular size = 2 * arctan ( Size / (2 * Distance ) )
Most people sit within two to three feet of their computer monitors. At three feet, that thumbnail image's apparent height is:
2 * arctan (1 inch / (2 * 36 inches ) ) = 1.59
At that distance, the thumbnail image covers 1.59 degrees of arc.
Most people sit quite a bit further from their televisions; 12 to 15 feet is pretty common. That thumbnail image that's 1.5 inches tall on your 42" television appears much smaller when you're 12 feet away. How much smaller?
2 * arctan (1.5 inches / 2 * 144 inches) = 0.596
At 12 feet, the image covers 0.596 degrees of arc, or 0.375 times the size as the image on the monitor at three feet. This thumbnail image of my dog Charlie:
And 12 point type becomes 4.5 point type, which is unlikely to be effective in a user interface.
The ramifications should be obvious: Everything in your TV user interface has to be at least three times as large as you would build it for a desktop user interface. Four times as large is probably a better rule of thumb.
This is a bitter pill to swallow. If every user interface control has to be four times as large as what you would normally build, you can't get nearly as many of them on the screen. There just isn't room for toolbars with lots of buttons or for big blocks of explanatory text. There's room for what is necessary, and nothing else.
As Antoine-Marie Roger de Saint-Exupery said, La perfection est atteinte non quand il ne reste rien à ajouter, mais quand il ne reste rien à enlever. (English translation: You know you've achieved perfection in design, not when you have nothing more to add, but when you have nothing more to take away.)
Viewed on your development computer, your interface is going to look like a child's toy with too-large buttons and not enough stuff to make it look interesting. That's okay. Your users won't be looking at it on a desktop computer monitor.
No mouse, no keyboard
Obviously, mouse and keyboard are attached to most HTPCs. I've yet to see a Windows or Linux box with a GUI that I could operate without them. Some single-purpose boxes work with a normal TV remote, though most accept a keyboard too. But for the most part, people don't want to use a mouse and keyboard when they're watching television.
The preferred television input device is a remote control that has four movement keys (left, right, up, down), a selection button of some kind, and potentially channel up/down and volume up/down buttons. Plus a power button, for a total of 10 buttons. Any other buttons are luxuries for the user interface designer and for the most part unnecessary distractions and sources of aggravation for users.
Apple has shown that it's possible to create compelling products with a remote, shown here, that has only six buttons.
Remember that your TV user isn't sitting at a desktop with his hands on a keyboard and a mouse close at hand, and his attention fully focused on what he's doing. Rather, he's reclining in a comfortable chair, the lights are off, he has his favorite beverage in one hand (mine would be a bottle of my latest homebrew beer), and the remote in the other. All of his input is done with one thumb.
Your user interface has to be simple and obvious, and your user must be able to navigate it with his thumb without looking at the input device. That's how users watch television, and that's how they expect to navigate their new Internet TV interfaces. I cover this in detail below, but for now assume that your user interface has these constraints.
Not only do you have limited space to place user controls, you also have to make sure that the controls are laid out in a logical manner so that users can easily and quickly – with a minimum of button presses – move the focus from one control to the next. Imagine building a traditional dialog box that uses the arrow keys for navigation and the only other input is the Enter key, and you'll get the idea.
There is no "Escape" key that will exit the current mode. Rather, the user has to focus the "Exit" button and hit Enter.
Common tasks should take as few cursor movements as possible. Things that are done infrequently can require more button presses, but remember that every time you force the user to press a button, you're poking him in the eye. The user will complain about the cumbersome interface, but he is not willing to set down his drink, turn on the lights, find the keyboard, hunt for the mouse under the couch cushions, and place them on a suitably flat surface in order to operate your interface as he would a computer. You're stuck with the arrow key interface and with users' complaints.
Users really are sensitive to how many times they have to move the focus. I said above that every button press is a poke in the eye. One of our early users was a bit more colorful, saying, "Every time I have to press a button, God kills a kitten."
We're accustomed to hierarchical user interfaces, where one modal dialog box brings up another, and another, ad nauseum, and then having to dismiss each dialog box to get back to the main program's interface. Do not try this at home. Television users do not appreciate this kind of interface.
The best interface, of course, is one that has only a single screen. If you must provide multiple screens (and, with the limited display size, you almost certainly have to), it should be possible to exit the user interface and go back to the currently playing video from any screen – no matter how deep it is in the hierarchy. A "back" button that takes the user to the previous screen is a nice thing to provide, but it's not absolutely necessary. It depends mostly on how deeply you nest your interface. Making the user repeat two or fewer levels of navigation – provided the navigation requires a small number of button presses – is okay and may be preferable to including a "back" button.
For my HTPC, I bought an Adesso ARC-1100 "Vista Remote," pictured below. I don't like it, but I dislike it less than any of the others I've tried.
This remote has a mouse pad of sorts in the middle, and buttons that act as left and right mouse buttons. Using this is nothing like using a mouse. Pressing on the pad moves the mouse cursor as you would expect, but slowly and with little precision. It works, but it's incredibly frustrating. I mention this only to caution you not to depend on it. If you build an interface that requires a mouse, and expect people to use something like this in place of the traditional mouse on the table, nobody is going to use your interface.
That remote has other buttons that do all kinds of strange and not-so-wonderful things. The thing is designed for Windows, and has buttons to start and control Media Player, and I don't know what all else. The volume up and down buttons change the Windows volume, and there doesn't appear to be a way to capture those keystrokes so that they instead control the volume of the video that's being played. That's very unfortunate.
The remote also has a button on the lower left that, when pressed, closes the current window. That is most annoying. The first few times you inadvertently press that button, you'll question the intelligence of the designer. The tenth or twelfth time, you'll start wondering if you can track down the designer and personally return the remote – preferably in a very uncomfortable way. To my knowledge, there's nothing your application can do to prevent the harmful effects of these special keys. Sadly, companies that produce remotes seem to be in the "more is better" camp, and don't produce simple remotes for use with Windows.
If you want the largest possible customer base, you should target the Web browser. Some of the devices I mentioned above (including Apple TV) require custom applications, but an HTPC and many of the other devices have Web browsers that you can use. You can write custom applications for Windows or Linux based HTPCs, of course, but that requires your users to download and install an application. At least today, the standard Web browser gives you the largest user base with the least effort. That may change, of course, if Apple builds an Internet-ready TV that's as successful as the iPad.
However, you still have to remember that your TV user isn't going to interact with the browser in the same way that a desktop user does. You can't depend on him being able to click on the browser's Favorite or Bookmark button, so you have to provide that functionality in your interface. Nor can you depend on the browser's Back button, although you have to write your code so that it won't do something stupid if the user presses Back.
As with any Web application, you're also at the mercy of browser developers. You have to support multiple versions of Internet Explorer, Google Chrome, and Firefox, and most likely Safari and Opera as well. Each one of those browsers has its own idiosyncrasies, as any Web developer knows.
Although platform limitations are likely the least of your worries, it's very important to get it right. TV users are much less inclined than desktop users to accept "browser bug" as an excuse for your application's misbehavior. Watching TV is a leisure activity, and users are more likely to find some other site to visit than they are to change browsers or download updates. TV is play time, and people value their play time.
"500 channels and nothing on" has been a joke for 20 years or more. I don't know anybody who has 500 channels on his TV, but up to 100 channels isn't uncommon. I tend to agree with the "nothing on" bit, as did Bruce Springsteen, it seems.
Putting aside the "nothing on" rant, navigating more than a few dozen channels with traditional TV interfaces is difficult.
57 or 500 channels is nothing compared to the amount of content available online. Our Podly TV product identifies more than 30 channels in the news category alone! Add channels for comedy, entertainment, hobbies, kids, learning, music, people, etc., and you quickly get to thousands of channels with new content added constantly. There's always something on. The problem is finding it. Sites that provide access to the huge back catalog of old TV shows and movies have a similar problem, although perhaps not quite as large.
The content explosion is twofold. With a TV service that has 100 channels, the user has the choice of 100 things he can watch at any time. That number increases somewhat with the digital video recorder (DVR) in that the user can select from any of the saved shows. But the scale of the problem is manageable. If you assume that a TV show is at least 30 minutes long, then the maximum number of selections for a given day is 4,800 (100 channels, with 48 shows on each channel).
At Podly TV, our Web crawlers find more than 1.5 million new videos every day. More than 90% of them are of limited interest, but that still leaves more than 100,000 new videos every day that are of general interest. Those are just the ones we know about, and it doesn't count the huge back catalog of TV shows and movies that increasingly are becoming available.
The lack of a traditional input device and, to a lesser extent, the relative lack of screen space, makes it exceedingly difficult for users to find the content they're looking for. Since there's no keyboard, the user who's interested in wood carving videos can't just type in "wood carving" like he would when doing a Google search. Even if you could do a traditional search, the lack of screen space lets you display just a few results at a time, and scrolling through a long list is frustrating to the user.
One option is to create a hierarchy, with categories at the top and sub-categories, etc. to any arbitrary depth. Such a thing is trivial to navigate with a mouse and a lot of screen space, but again becomes difficult on the TV platform. Hierarchical navigation also requires a whole lot more attention than the user is likely to give you. It can be done, but our experience has been that users don't much care for it.
We've had some success with a kind of tag cloud navigation system wherein the user selects from a small list of initial tags (10 or fewer), and then further refines his search by clicking on related tags that are derived from the results generated by the first selection. That might at first sound like a hierarchical navigation system, but in reality it's much different because most videos fit in more than one category. With a traditional hierarchy, there's a single path to any particular video. In this system, you can get to the same video from many different paths.
The lack of a traditional search method is not all bad. One benefit of our tag navigation system is that it promotes discovery – finding things that are interesting, even if they're not exactly what you're looking for. Considering how people often use TV, this is a very good thing. Often, the user is looking for "something to watch," rather than one show in particular. The tag navigation lets him start with a broad outline and refine it as much or as little as he likes.
If users wanted to surf the Web on their TVs, Microsoft's acquisition of Web TV for $500 million in 1997 wouldn't be viewed as one of the biggest product flops of all time. People don't want to surf the Web on their TVs. They want to watch TV! Sure, a few people like the idea of checking their Facebook pages on their new 42" TV, but they don't do it for long. First, there's the size problem; either they change the font size so that they can see the text, which makes viewing anything on Facebook very frustrating, or they sit within five feet of the TV. Either way, it's uncomfortable.
The same goes for checking e-mail, writing documents, or just surfing the Web. You have to sit uncomfortably close to the TV, or you have to make the font size so large that very little is displayed on the screen. Plus, anybody who comes into the room can see what you're doing.
Tablets had been around for years before Apple popularized them with the iPad, and Internet connected phones were around before the iPhone explosion. Relatively few people bought tablets, and only the most dedicated used their phones for anything but checking e-mail. Apple discovered something that everybody else had missed or discounted: People do different things with different devices. Nobody wants to do spreadsheets on their iPhone or iPad. Allowing them to would complicate the user interface. Apple discovered that if they limited the user interface to supporting what people most want to do, they could create a device that people really want to use.
A TV user wants to turn on his television, select a program, and watch it. He doesn't want the Web. He wants access to a small portion of the content that's available on the Web. He doesn't care about fancy user interfaces with tons of eye candy and flashy graphics. To him, the user interface is at best a minor annoyance.
If your interface works – gives users quick and easy access to the content they want -- they'll use it. Don't expect them to like the interface, though, because they're not really focused on that. Be happy if they don't actively dislike it.
Because users aren't focused on your interface, you don't have to spend a lot of time on non-functional eye candy. As long as it works well, "not ugly" is acceptable. "Good looking" is better, but putting effort into making your interface more attractive is not the best use of your time. You're much better off spending your time giving users easier access to more and better content.
Simple and effective are better than good looking. No matter how good your interface looks, nobody will use it if he can't figure it out or if the payoff for figuring it out is very low. On the flip side, if you provide the best content, you can get away with slightly more difficult interface. Slightly. There is a limit to what users will endure.
The exact equation differs from user to user, but in general users operate on a effort-versus-reward basis. If you're showing viral YouTube videos, your interface had better be very attractive and brain-dead simple to use because the reward for each video is vanishingly small. Nobody is going to suffer through a dozen button presses for each video. On the other hand, if you're giving users access to just-released movies or television shows, they're willing to put up with some annoyance because the reward is very high: two hours of video with no, or very few, interruptions.
Something else to keep in mind when designing your interface is that, unlike with desktop computers (and to a lesser extent, mobile devices), the user's attention is not fully focused on your interface. The user is talking with friends, petting the cat, thinking about the last video he saw or perhaps watching and listening to a video in a thumbnail while he's operating your interface. You cannot assume that he's giving you the same amount of attention that he'd give a desktop application. As a result, he's going to make mistakes. It's up to your interface to make sure that those mistakes are easily corrected. Fortunately, it's unlikely that any mistake he makes will be catastrophic. Just annoying. And he's going to blame you for it.
Finally, you have to realize that the average TV user doesn't have the same experience with computer user interfaces as do users of desktop computers and mobile devices. Expecting them to know how to work your computer-like interface is a mistake.
This is a new field, and nobody has all the answers. We don't even know all of the questions. We can take some hints from DVRs, but their interfaces are notoriously complex and difficult to use. There are few good ideas to be found there and although people who are accustomed to their DVR interfaces won't necessarily fault you for creating such an interface for your product, they will quickly move to another service that provides an interface that's easier to use.
The future is now
For at least fifteen years, people have been predicting that the Internet will join with the television "in the future." It's taken a lot longer than those pundits thought it would, not due to lack of technology but rather due to a lack of television-friendly content. Nobody wanted spreadsheets, word processing, e-mail and Web browsing on their TVs. They wanted movies and other videos.
Today, with millions of movies and television shows, and billions of other videos available at the click of a mouse, the content is there. And more than a million new videos are being uploaded to YouTube and other video sharing sites every day. Granted, the videos uploaded follow Sturgeon's Revelation that 90% of everything is crud, but that still leaves more than 100,000 new videos every day that are interesting to more than just the uploader's immediate family.
As developers, our job is to create user interfaces that give users access to that content in a way that's familiar to the users. We can't expect the users to adapt new habits just so they can see our content. Rather, we have to adapt our user interfaces so that they can be operated in the dark by somebody who's giving us less than their full intention, and with a very limited input device. It's quite a challenge, but the potential payoff is enormous.