Generate Sample Data
I’ve long been a fan of Jimmy Pena’s Random Sample Data Generator Add-in and I use it quite a bit. I wanted to write my own, however, and introduce a little more flexibility. In particular, I was creating some sample inventory data and I needed to create inventory part numbers. My part numbers are of the form AIM-0000 or PP-0000. Easy enough to do with a RANDBETWEEN and some concatenation, but I wanted to build it in to the generator. So I did.
I’ve attempted to break down the samples into types. I have
- Number. A number between a minimum and a maximum with a specified number of decimal places. It can be random or sequential. If it’s sequential, it can skip by a specified amount.
- Date. A date between a minimum and a maximum. It can be random or sequential. It can skip by days, months, quarters, or years.
- Time. A time between a minimum and a maximum. It can be random or sequential. It can skip by hours, minutes, or seconds.
- Constant. This is a string that simply outputs itself over and over. This would mainly be used in compounds.
- List. A predefined list of text. It can be random or sequential. You can import lists from a text file rather than typing in all the entries.
- Text. A string of a specified length. It can be upper, lower, or proper case.
- Compound. This consists of one or more of the above types.
Here’s the main screen.

You select any number of samples and put them in the output box. An example is shown to make sure you’re getting what you think you’re getting. You can specify the number of records (defaults to 50). You can output the data to a new workbook, a new worksheet, or the active cell. To add a new sample, click the Add link above the Samples box. You are presented with a dialog to specify the properties of the sample. Here’s what the dialog for a new Number look like.

You can also edit existing samples. Here’s what editing a list looks like.

The compound type is the most interesting. The below example shows how the social security number was constructed using other sample types.

You can also add or edit existing samples from within the Add/Edit Compound dialog. You can’t delete, though, because that causes too many problems.
All of the samples can be put into categories. There is a textbox on each of the dialogs that accepts a comma separated list of category names.
I’ve already noted some bugs as I was writing this post, and I’m sure there are a few more. If you want to give it a try, let me know your questions and comments. The code is wide open, so feel free to have a look around.
You can download dkRandomData.zip
Stick the add-in somewhere and make sure the two xml files are in the same directory as the add-in. Load the add-in and look for a Quick Sample Data entry under the Tools menu (it’s under Add-in in 2007 – the Ribbon part is on my todo list).
Key to the usefulness of this, I think, is have pre-built samples in there already. If you note something that is obviously missing, let me know.
Nice job, the only thing I would really like is being able to add numbers that fluctuate up an down. For example, you could specify a degree of randomness around a sine curve, and then specify the periodicity of the sine curve (e.g. 3/4 of a cycle over the length of the data, starting at a certain number). When I create dashboards in Excel or other tools, I like it to tell a story – that’s hard to do with data that is either random or just goes up or down.
Great tool, thanks.
You forgot the up/down arrows to rearrange the output fields, otherwise, great job!
Good one Alex. I’ll have to learn some probability math, but I’ve added it to my to do list.
JP. Yeah, that’s on my list. I don’t know why I didn’t just get it done.
Nice work, Dick! I had thought about writing something similar (in Python, I’m just learning Excel and VBA) and wasn’t aware of Jimmy Pena’s solution or yours. The one feature I wanted to add in mine (and I’m not sure that Excel can do this or not) is to be able to seed the random number generator so that you get the same “random” data set multiple times. This could be useful for unit testing, so that everybody is working from the same data set.
The simple solution is, of course, just save the resulting data and share that. The geek in me cannot accept this answer
Shawn,
You can generate the exact same set of “random” numbers. From the help files for Randomize:
To repeat sequences of random numbers, call Rnd with a negative argument immediately before using Randomize with a numeric argument.
So it could look something like this:
Dim a As Long
Rnd (-1)
Randomize (2)
For a = 1 To 50
Cells(a, 1) = Rnd
Next a
End Sub
You should get the same result every time.
Josh
Will this add-in
work with EXCEL 2002 ?
thanks!
SFdude
SFdude: It should work in all version 2000 and greater, although I only tested it in 2003 and 2007.
Thanks Dick -
Works *great* in Excel 2002,
under XP Pro/SP3
…Superb!
Q1) Suggest adding ZIP codes to the “LOCATION” category,
i.e.: 94108-123 (with and w/o -### extension),
Q2) Tried adding “ZIP code” myself,
using the “Add” link (above the middle column),
but couldn’t add a new category to generate ZIP codes
with a -### extension:
i.e.: 94108-123
Q3) What are the Categories with an _underscore?
i.e.: _Compound, _Constant, etc…
SFdude
A zip code would be a compound sample. Make a five digit number, a three digit number, and a hyphen (if they don’t already exist). Then create a compound sample which combines them in the right order.
The underscore categories are sample types. So if you make the above zip code, it would appear in the _compound category because it’s a compound sample.
[...] instead of square. Check out the post on Junk Charts to find-out more about this technique. Generating Sample Data for Your Excel AdventuresDick Kusleika has taken inspiration the much loved random data generator add-in from JP and created [...]
Would it be too much to ask for Web addresses matching with the Organization names in all records ?
I mean, if I use both fields while generating a sample, an organization name ends up with multiple web-addresses.
Unless, the RANDOM bit generates each field’s values individually, and hence this would not be possible.
Khushnood
Khushnood: I considered that feature, but it was too much work. When I need relatable data, I use a formula. For domain names, for instance, I would not include the domain in the output, but I would include the company name. Then I’d use a formula in another column like
and fill down.
Thanks Dick… the formula will do just fine !
Hi Dick
This is a really useful add-in. Thanks
Obviously I have opened this up in the VBA editor to take a look at how it works (struggling my way through as we speak) but have a question:
Why XML for the data? Would it not have been as easy to include the data in the spreadsheets? Is this a speed or processing overhead issue?
I presume it is possible (but don’t know how) to add the sex to the FirstName items so that ‘realistic’ data is available – i.e. not having girls called Nathan.
Anyway – thanks again.
Why XML: A couple of reasons. First I wanted the data to be independent of the code. That way I could release version 2 and it wouldn’t erase your custom sample data structures – it would just use (or convert) the existing XML file. I could have used CSV or some other text format, but I wanted to do a project with XML for the experience, so I picked it.
The other reason was I wanted to be able to share XML files. You could set up an XML file that had a lot of custom samples that were particular to your company and distribute the XML file to colleagues. Beyond that I wanted to build an import procedure so that I could import your XML file and it would merge the two. Then an export function so that if you created a really great list of vehicle make/model combinations, for instance, you could post that xml for other people to import.
Name and Gender: The way I did this for the US Cities and States sample was to create a sample of US Cities to a worksheet. Then I manually added the state to the adjacent cell. Then I concatenated them in another cell and used that to create new list which insured that the cities and states matched up. It’s a manual process to link them together, but might be worth the effort depending on how many you need.
[...] [...]