Selenium: The Collection
Selenium is the banner for a set of open-source projects - a collection of tools. The Selenium Integrated Development Environment, or "Selenium IDE" is a FireFox plugin that records and saves user actions and allows you to play them back - in FireFox only, with no loops or other programming structures. Selenium IDE "tests" are simple click-click inspect routines. The tool does provide the ability to export code to Selenium WebDriver. Selenium IDE is easy to learn, not very powerful, and provides a straightforward upgrade to something stronger; you can think of it as the training wheels of the toolset.
If the IDE is the training wheels, Selenium WebDriver is some sort of motorcycle. With a sidecar. And jump jets. You can think of WebDriver as an 'object', in many programming languages, that drives the browser - with events like click(), open(), type() and wait_for_page_to_load(). Here's a bit of WebDriver code in ruby:
page.open "http://www.amazon.com" page.click "id=twotabsearchtextbox" page.type "id=twotabsearchtextbox", "software testing" page.click "css=input.nav-submit-input" page.wait_for_page_to_load "30000"
Programmers that know the WebDriver API can program tests in many popular languages, including C#, Ruby, Java, and Python. The tool also has hooks into testing frameworks, like Rspec and Test::Unit, to generate test reports that are easy to read or create graphs. There are also extensions for WebDriver to work on mobile devices like Android and iOs, called Sendroid, iOsDriver, and Appium. While IDE provides an export to generate WebDriver code, most programmers write the code directly. Selenium is open source, so it is completely free for you to download and use, there are no per seat licenses and no concurrent user restrictions. But, that doesn't mean there is any shortage of support to help you along the way though. There are informal routes you can take. You can go to StackOverflow with technical questions or problems with implementation. There is also plenty of commercial support around if you need recommendations on using Selenium in your specific environment or help managing your tests.
The Limits of Selenium WebDriver
It's important to remember that Selenium only does browser automation, nothing more. Traditional Windows apps like Word and Excel do not run in a browser, so WebDriver will not be able to see or interact with them. (If you write a Windows app, you won't be able to use Selenium to drive it.) That's because of how WebDriver works - it manipulates web pages inside the browser, using the same engine that interprets web pages. Operating system click events, like the F(ile), E(dit), V(iew) menu, are hidden from Selenium's view.
Likewise, the tool cannot see objects that appear in a web page but are not HTML, like PDF's and Java applets. One common pattern for tests is to change a value in Excel, save the spreadsheet, then upload the spreadsheet. Another is to print the web page as a PDF and view it to examine how printouts appear. Because WebDriver can't see outside the browser, it will not be able to do these tasks. Even if it could, Selenium works by checking text and link values - it does not do image compare. That means the text could be the wrong size, or in the wrong place, and WebDriver will still report success. (It is possible to explicitly check font size and color, though few people do.) Let's take a closer look at some reasons you might choose Selenium as your tool to automate browser checks, and a few approaches for you to consider.
Selenium does not scale well in its browser testing capabilities. The Selenium Grid exists, to spin up a distributed network of Selenium instances, allowing you to run tests concurrently, but it is both hard and expensive to maintain. You can run your selenium testing in the cloud services like CrossBrowserTesting.
At its most basic level, Selenium is an application programming interface (API). Once you've settled on using the tool, you still have to figure out what approach makes the most sense. The approach you choose is closely related to the problems you are trying to solve, these dilemmas may help you make that decision.
Pure vs. Building Blocks
Beginning with IDE or straight up WebDriver is the simplest approach and will give the most immediate results. The product of these two options is pretty similar; testers develop individual scripts that have a lot of duplicated code. This can be appropriate on some teams, but tends to lead to technical debt and maintenance problems. WebDriver code is, after all, code, and might best be created by production programmers. The problem with this approach is that writing test code means not writing production code, which slows down the project immediately in the hope of speeding it up later. The building blocks approach sits in the middle. Programmers create the building blocks, often called a Domain Specific Language, or DSL. The DSL is a wrapper around selenium that allows non-programmers to write extremely high-level code in something that is close to English. More importantly, your team can use the DSL concept to create a language in the domain of your product. For example if you are writing a checks for a library web site, you might write functions checkOutBook, checkInBook, and searchInventory. I have had a much easier remembering and writing scripts with this method. Also, it's nice not having to repeatedly program: click this, type that, click this, type that.
Big vs. Small
Another important choice to make is how big or small the use of WebDriver will be, how much of the process to automate. It's common, for example, for teams to set out a goal to automate the entire test process, which sounds wonderful in theory but can be expensive and brittle in practice - especially if the user interface is undergoing continuous change. Even with a DSL, your team will reach a point where the amount of time spent on maintenance is equal to or greater than the amount of time they are spending creating new checks and adding value to the project. With your product changes, the DSL will have to change to keep up with that, and the scripts change to keep up with the DSL, and occasionally you'll have Selenium and browser updates as well.
Test Automation Pyramid by Mike Cohn
Alternately, your team can use Selenium to create much smaller, much more contained scripts. These are often useful for pre-release sanity checks or new build sanity checks. Usually these checks run much faster and require much less time to maintain. Often they are so brief that in addition to having assertions in your scripts looking for certain attributes of your program, a tester can watch the script while it is running for all the things that may be problems that the script is unable to look for. Most teams that experience long-term success with Selenium end up using it as the 'tip of the pyramid', creating a small suite that runs quickly that uncovers large errors. At one point Gmail had 50,000 selenium checks, and ended up throwing the vast majority of them away, something Markus Clermont discussed in this 2008 Google Tech Talk.
Where to go from here
So far, I've given you some information on what the Selenium suite of tools is made of, some reasons you might choose the tool set and questions to ask along the way, and some considerations you will have to make when time comes to implement. Because Selenium is open source, you can download it and try it at http://www.seleniumhq.org/.