Monte Carlo Simulations, Fibonacci Numbers, and Other Number Tests: Why Developers Still Need The Basics
Perhaps you encountered these in math classes or an elective in your CompSci degree, and think they’re just “theoretically” interesting. (Or not.) But, Tom Henderson argues, these have relevance to the real world of programming too.
Building realistic behavior into your applications can be tough. If you took classical computer science or programming in college, you may not have heard of Monte Carlo simulations, Fibonacci numbers, and other methods of using numbers to build tests and simulate situations – unless you explored the math side of programming or statistics.
Simulations are powerful ways to test inputs for expected outputs, and of course, the reality check of overall system attributes and needs. It also keeps applications from exploding, or the insanity of trying to understand failures in results. And these “freshman math” concepts can help you get the work done.
Monte Carlo Simulations: Controlled Spaghetti-Against-The-Wall
Perhaps you have direct control over the boundaries of all your application inputs. Or perhaps you think you do. Maybe you're on a team with lots of programmers maintaining differing pieces of an application. Your job: good consistent code. Good luck with that.
Sometimes – often, I hope – functions, libraries, and the amazing variety of possible user inputs are all nicely bounded; life is dull; and we sip coffee or energy drinks. Then, on a bad day, it's discovered that a parser can't handle certain kinds of inputs, or are subject to hostile or erratic, beyond-the-boundaries inputs. Perhaps outputs or results become suspect. Reliable data becomes obscured by smoke pouring from developers’ (and worse, users’) ears as unexpected results generate WTF?s.
What does your application do when the unexpected happens? Gracefully recover, blow to smithereens, or swallow a bizarre input and try to continue? Simulations help render an expectation of behavior of applications, functions, objects, and interactivity between and among applications that can't be derived easily in almost any other way. You can't parse everything. But you can certainly throw a sufficiently randomized domain of inputs and discover how your code and what it works with reacts to unexpected change. Can an unexpected output be believed? Can a range of seemingly strange outputs be reliable?
Monte Carlo simulations are built of (pseudo) randomized inputs, a wide number of them. Through functions and processes, and iterations of test rounds, they renders output(s). If all is well, the output is within the domain of expected answers. Outputs are examined for reality checks. With a little work, the graphs between input(s) and output(s) help you render a visualization of the possible effects after many simulation iterations. These views can be highly enlightening and revealing — even entertaining.
On a micro-level, the process is one shoveling a wide variety of inputs through a function, where you can track the outputs. Then you look at them in terms of normalization, standard deviation, as well as boundaries tested to their limits within the sanity of probabilistic results. Think of a recipe program that converts U.S. measures to metric. You can get an idea of what a recipe for eight servings might require in terms of quantity.
What if a caterer jumps in and starts trying to serve 200 servings at a wedding? Can all of the inputs safely work? Are combinations going to explode? Will your calorie/serving calculator lay an egg? Beyond the issue of whether various variables can handle math as their declaration, there is the question of what individual functions and child processes do over a very wide range of inputs. Does the converted recipe print correctly?
On a macro level, Monte Carlo simulations allow scenario analysis, further probability analysis, and input impedance/sensitivity analysis. Scenario analysis establishes upper and lower boundaries for inputs that are fed into a process. The process’ output is then examined for effects as the boundaries are changed, stepped through confining ranges, and compared. This is done iteration after iteration until you can make sense of the results over a number of trials. Sometimes, these fit into time-series analysis, such as Fourier Transforms, so that iterative changes can be visually analyzed.
Probabilistic process determination requires an input domain that spans beyond the standard deviation into sometimes seemingly insane boundaries, but not impossible ones. Step-wise iterations in probabilistic simulations require lots of data, and therefore, input randomization on a large scale. Here, suggestions of understanding pseudo-random number generators and their possible limitations becomes mandatory before you design the simulation.
Where multiple inputs variants are possible, simulations allow ranges of independent inputs to how input factors are correlated, and how variation within one input affects results contingent on one or more variance(s) or delta in a pseudo-random range. Fewer surprises and head-desk moments should result.
If you need to check formulae in simulations, there are apps for that. Some software fits into Excel or equivalent spreadsheets, allowing for both random number generation of reasonable size and quality, along with iterative stepping, and the convenience of very fast graphing in a single product.
Here, in spreadsheet software, the capacity to model object behavior is fairly handy. Pull up a sheet, parse a pseudo-random number into bits that fall within the not-impossible domain. Stepwise feed them to the inputs of your object, watching the state of the output; or build a table with the outputs and graph it through iterations. Stroke your chin. Ponder the results. Happy? Great. Strange to you? Find out why, to discover if you're seeing reality or a separate, even head-desk reality.
Randomizing and Boundary Notions
Monte Carlo simulations can have multiple and discrete series of inputs, outputs, boundaries, and iterations for observations of behaviors. However, there is occasional need for a puppet master that can pull the strings between inputs, outputs, and even nodes. The concept of a node is an element or process where something is processed or work is done. Inputs and outputs are connected with other outputs. It might also be a measuring point in and/or among processes where a dipstick can be inserted into a computational engine to see how much oil is left or needed, or (metaphorically) what color the oil is.
The randomized inputs can be generated somewhat easily, via Fibonacci sequences, altered to produced randomization. John Von Neumann generated such a sequence by simply altering a digit regularly, then slicing long concatenated strings into usable “chunks” to be fed into inputs. Fibonacci numbers are formed by a simple transform that renders a series of integers in a recurring relationship. As the sequence progresses, each new number is yielded through the same transform. Von Neuman killed the sequence by a simple digit substitution function. Boring, but useful.
The further usefulness comes from the fact that Fibonacci numbers allow a number of possible applications, including rapid denominator reductions and the ability to make very simple pseudo-random number generators, the kind used in Monte Carlo simulations. Fibonacci numbers used this way, however, can be algorithmically reduced in such a way as to make resultant numbers useful in encryption, without a lot of extra work, such as additional hashing or algorithmic moshing of the sequence in a predictable way.
But another use of Fibonacci numbers arrives in the use of Fibonacci number-seeded trees, called Fibonacci Heaps. A Fibonacci heap is a tree. Leaves on the tree (algorithmically induced) render great fodder for building graphs, and doing heap sorting. When I watch older browsers render GIFs I am seeing Fibonacci heaps at work, especially on slower connections as one watches a Fibonacci heap evolve into a visual JPG rendering. It's a Fibonacci heap at work.
Still another use of Fibonacci numbers is the Fibonacci Cube, which is a matrix built upon a Fibonacci sequence like a hypercube. With the cube comes intersecting points, and down those points are the basis of least-cost routing algorithms, least-shortest-path protocol firmware, dirty cache detection algorithms, computing and big data cluster servicing, Hadoop-style. Cluster servicing and large computation servicing algorithms are built on Fibonacci trees and cubes, too. If you want to create software using these algorithms and tools, it behooves you to pull out your old math textbooks for a review.
More Fun With Simulations, Sequenced Tests, And Numbers
Many discrete functions can be tested with Fibonacci sequences, and those sequences are subsequently blurted as input fodder for simulation needs. It helps to have a math brain that can consider the domains and boundaries associated with both discrete functions (and their test) but also the big picture of how functions relate to each other, and how apps relate to each other over what kind of time-domain/periodicity.
In turns, apps relate to each other, data transfers correctly and is within expected boundaries are delivered after a simulation Then we get to drink coffee and energy drinks until the next crisis occurs.