Office Automation

Someone asked if anyone automates Office applications any more. That struck me as kind of funny. If I had to guess, I would guess that there are more lines of VBA code (poorly written or not) that automate Office applications than all the COBOL, VB6, PHP, and any other language you can think of doing anything. I don’t have any evidence of that. It’s in the Outlook-is-the-largest-data-store vein. Because there are so many Excel, Word, Access, etc documents out there, I think all the code in them would add up to a bunch. What do you think?

There are more lines of VBA code automating Office apps than all other kinds of code doing all other tasks
True
False
Can’t/Won’t Guess

  
Free polls from Pollhost.com

PowerPivot - Part 3 of 4: Conditional Shape Colors

Part 1: http://www.dailydoseofexcel.com/archives/2010/02/26/powerpivot-part-1-of-4/

Part 2: http://www.dailydoseofexcel.com/archives/2010/02/27/powerpivot-part-2-of-4-prepping-the-census-data/

As mentioned in the Part 1, I had developed this technique to conditionally color shapes some time ago. Based on customer feedback and my own experience I made changes to simplify the various connections required to make it all work.

Also, documenting the technique was easier to do with a webpage for two reasons: 1) the length of the post, and 2) I find the UI for a blog post somewhat cumbersome for managing images.

The updated solution is at http://www.tushar-mehta.com/publish_train/xl_vba_cases/0301-dashboard-conditional_shape_colors.htm

In Part 4, the last part of this multi-part post, I will look at using PowerPivot from within Excel to analyze the 18 million row data set generated in Part 2 and on integrating the result with the Conditional Shape Color solution of this post.

picture11

Web Query Lessons

I’ve recently learned some things about Web Queries that I’d like to share with you. But first a little background. You create a Web Query in Excel 2007 by clicking From Web on the Data tab. This launches a browser within Excel through which you navigate to a web page and select a “table”. Tables on web pages can be a few different things. I think the common “tables” are html tables and html div tags.

Lesson 1: The web browser Excel uses is some variant of Internet Explorer. If you don’t use IE, and I don’t, then you haven’t bothered to set the home page. In that case, the home page will be msn.com and you might see errors like this:

Awesome. Why did I ever stop using IE? If I open IE8, set my home page to google.com (or anything else), then the errors go away. Bonus April Fools day tip: Find a co-worker who uses Firefox and set his IE home page to hawtness.com. It’s marginally NSFW, so it will be funny when he does a web query, but he won’t get fired (probably).

Lesson 2: If you try to refresh a Web Query for a web page that requires you to log in, it won’t work (sometimes). You’ll get

This Web query returned no data. To modify the query, click OK, click the name of the external data range in the name box on the formula bar, and then click Edit Query on the External Data toolbar.

I have a workbook with a couple dozen web queries all pointing to the same site. If I ‘edit’ one of the queries, Excel’s instance of IE remembers that I have logged in (cookies maybe? that aren’t shared with normal IE?) and I can refresh all the remaining queries without error. So I created a sheet near the beginning called ‘Credentials’ where I have a Web Query that brings in a small table near the top of the home page. I ‘edit’ that query to log in, but I don’t have to navigate around much to bring it in. Then I run my code which updates all the URLs and refreshes all the queries. The poor man’s IE automation, I guess.

IsHex Function

I have a need to verify some user input is hexadecimal (0-9 and A-F).

Public Function IsHex(sInput As String) As Boolean
   
    Dim lResult As Long
   
    On Error Resume Next
        lResult = CLng("&H" & sInput)
       
    IsHex = sInput = "0" Or lResult > 0
       
End Function

With help from this Mr. Excel thread. So where did &H come from? Are there more like it?

PowerPivot - Part 2 of 4: Prepping the census data

Part 1: http://www.dailydoseofexcel.com/archives/2010/02/26/powerpivot-part-1-of-4/

Since Excel 2007 (and later) can handle a million rows of data, I figured it made sense to explore PowerPivot (PP) with a data set larger than that. In addition, I have had a long time curiosity about the US census data. So, this seemed an appropriate time to combine the two. After some exploration of www.census.gov, I learned of http://usa.ipums.org/usa/index.shtml This is a project of the Minnesota Population Center of the University of Minnesota.

Do note that this post is not meant to be a step-by-step tutorial on how to use the IPUMS website. While it took some time, it took even more effort to understand the census data but an explanation of the data is also not my intent with this post. The focus is on loading a large data set in PowerPivot.

The final data set I downloaded expanded to nearly 18 million rows of data. But, it took a while before I got to the point of getting that data set prepped for PP.

A relatively superficial exploration of the IPUMS site led to two conclusions. One, the result of ‘analyze data online’ would yield results online but the result sets were intentionally limited in size. Two, trying to get detailed data through the ‘Create an Extract’ or the ‘Download or Revise Extracts’ led to large data sets but of the kind that were intended for use with SPSS, SAS, or STATA. The data set itself consisted of fixed length columns. In addition, there was a delimiter at the end of each row record (discovered through trial and error).

In addition to the data set, the website provided the command sets to load the data into SPSS, SAS, and STATA.

picture1

The SPSS instructions were fairly helpful in understanding the layout and the content of the data (other than the secret delimiter).

So, for one data set the columns were

 year           1-4
 datanum        5-6
 serial         7-14
 numprec       15-16
 hhwt          17-20
 statefip      21-22
 county        23-26
 urban         27
 city          28-31
 citypop       32-36
 urbpop        37-41
 ownershpd     42-43
 mortotal      44-47
 mortamt1      48-52
 rentgrs       53-56
 hhincome      57-63
 pernum        64-67
 perwt         68-71
 age           72-74
 sex           75
 marst         76
 raced         77-79
 gradeattd     80-81
 occ1950       82-84
 ind1950       85-87
 inctot        88-94
 ftotinc       95-101

In addition, the SPSS instructions also contained information about the text values associated with the numeric codes in the data set. For example, ownershpd values were:

value labels ownershpd
00 "N/A"
 10 "Owned or being bought"
 11 "Check mark (owns?)"
 12 "Owned free and clear"
 13 "Owned with mortgage or loan"
 20 "Rented"
 21 "No cash rent"
 22 "With cash rent"

The above information is important since we can create Excel tables of these number-to-text maps and link them to the data table in PP.

What made the process of getting the data in PP challenging was that PP will not read fixed column width data from a text file though it will read comma delimited text files. I would have thought this would be elementary but I couldn’t find any way to get PP to do so. Maybe, I missed something that would have made the rest of the exercise below unnecessary.

So, as far as I could tell, I had to convert the fixed-column-width text file into a file where the columns were delimited by some character. I picked the comma. But, I couldn’t use Excel (too many records) or Word (too large a file) to do the conversion. I also tried Notepad and WordPad before falling back on true and tried VB(A).

I did not start with the 18 million record data set. I used the smaller data set for a single census from the 19th century to explore the structure of the data. Through trial and error I discovered the secret delimiter at the end of each record. I also discovered that the software did not summarize the data in any way. The code that I felt confident I could use for data from all the censuses combined is below.

It’s not elegant. It’s not flexible. It’s not meant to be either. At some point I might be tempted to paste the information about the column names and widths into a Excel worksheet and have the code automagically work off that. But, for now, the goal was to prep the data for PP.

One interesting thing I did learn was that using FreeFile() twice in succession yields the same integer value. That, of course, leads to an error when opening the 2nd file. So, the correct approach is to get a free file number, open that file, and then get a second free file number for another file.

Also, if one were to add up the column widths of the various fields one would get 101 but the record being read has 102 characters. The last byte is the delimiter that I want to drop.

Option Explicit

Sub writeOneRec(OutF As Long, Rslt As String)
    Dim Yr As Integer, DataNum As Integer, Serial As Long, _
        NumPrec As Integer, HHwt As Integer, _
        StateFIP As Integer, County As Integer, Urban As Integer, _
        City As Integer, CityPop As Long, _
        UrbPop As Integer, _
        OwnerShip As Integer, MorgTotal As Integer, _
        MorgAmt1 As Integer, RentGross As Integer, _
        HHIncome As Long, _
        PerNum As Long, PerWt As Integer, Age As Integer, _
        Sex As Integer, MarSt As Integer, Race As Integer, _
        GradeAtt As Integer, Occ1950 As Integer, Ind1950 As Integer, _
        IncTot As Long, FamTotInc As Long
    On Error Resume Next
    Yr = Left(Rslt, 4)
    DataNum = Mid(Rslt, 5, 2)
    Serial = Mid(Rslt, 7, 8)
    NumPrec = Mid(Rslt, 15, 2)
    HHwt = Mid(Rslt, 17, 4)
    StateFIP = Mid(Rslt, 21, 2)
    County = Mid(Rslt, 23, 4)
    Urban = Mid(Rslt, 27, 1)
    City = Mid(Rslt, 28, 4)
    CityPop = Mid(Rslt, 32, 5)
    UrbPop = Mid(Rslt, 37, 5)
    OwnerShip = Mid(Rslt, 42, 2)
    MorgTotal = Mid(Rslt, 44, 4)
    MorgAmt1 = Mid(Rslt, 48, 5)
    RentGross = Mid(Rslt, 53, 4)
    HHIncome = Mid(Rslt, 57, 7)
    PerNum = Mid(Rslt, 64, 4)
    PerWt = Mid(Rslt, 68, 4)
    Age = Mid(Rslt, 72, 3)
    Sex = Mid(Rslt, 75, 1)
    MarSt = Mid(Rslt, 76, 1)
    Race = Mid(Rslt, 77, 3)
    GradeAtt = Mid(Rslt, 80, 2)
    Occ1950 = Mid(Rslt, 82, 3)
    Ind1950 = Mid(Rslt, 85, 3)
    IncTot = Mid(Rslt, 88, 7)
    FamTotInc = Mid(Rslt, 95, 7)
   
    On Error GoTo 0
    Write #OutF, Yr, DataNum, Serial, NumPrec, HHwt, StateFIP, _
        County, Urban, City, CityPop, UrbPop, OwnerShip, _
        MorgTotal, MorgAmt1, RentGross, HHIncome, _
        PerNum, PerWt, Age, Sex, MarSt, Race, _
        GradeAtt, Occ1950, Ind1950, IncTot, FamTotInc

    End Sub
Sub openDB9()
    Dim X, Rslt As String, OutF As Long
    X = FreeFile()
    Open "c:\tushar\work\usa_00009.dat\usa_00009.dat" _
        For Binary As #X Len = 102
    OutF = FreeFile()
    Open "c:\tushar\work\usa_00009.txt" _
        For Output As #OutF
       
    Write #OutF, "Yr", "DataNum", "Serial", _
        "NumPrec", "HHwt", _
        "StateFIP", "County", "Urban", _
        "City", "CityPop", _
        "UrbPop", _
        "OwnerShip", "MorgTotal", _
        "MorgAmt1", "RentGross", _
        "HHIncome", _
        "PerNum", "PerWt", "Age", _
        "Sex", "MarSt", "Race", _
        "GradeAtt", "Occ1950", "Ind1950", _
        "IncTot", "FamTotInc"
    Rslt = String(102, " ")
    Get #X, , Rslt
    Do While Not EOF(X)
        writeOneRec OutF, Rslt
        Get #X, , Rslt
        Loop
    'Stop
   Close #X: Close #OutF
    End Sub

And, so now, I have a CSV data set ready to load into PowerPivot.

PowerPivot - Part 1 of 4

After the MVP Summit I was motivated to further explore a Microsoft add-in for Excel 2010 called PowerPivot. I don’t know all its capabilities but at the very least it can analyze millions of rows of data. If you are using Excel 2010 you can download the add-in from http://www.microsoft.com/downloads/details.aspx?FamilyID=48a5b47b-8c9c-450f-ab6e-178600a733ca&displaylang=en

Do keep one thing in mind. The Community Technical Preview (CTP) version of the PowerPivot addin that is currently available is incompatible with the currently available version of Office 2010, i.e., the Release Candidate (RC). The problem seems to lie in that the data in the PowerPivot tables are not stored in the Excel file as they should be. So, on reopening the file, there is no database for PP to use! I assume (hope?) Microsoft will fix this incompatibility soon.

A day after returning from the Summit, I ran into a contest organized by Microsoft on creative uses of PowerPivot. I scrambled to put together an entry since the contest closed in 2 days. First, I had to install PowerPivot. Then, I decided to use census data as the source for some kind of analysis. It took me quite a while to figure out how to get that data. Then, I analyzed the data set with PP. Finally, I decided to marry the “%change in population from one census to the next” with a solution I had posted on my web site to conditionally color shapes in an Excel worksheet (http://www.tushar-mehta.com/excel/charts/0301-dashboard-conditional%20shape%20colors.htm).

I will post more detailed reports on my experience with PP and the census data. Part 2 will be about my experience getting the census data. Part 3 will be about the add-in to conditionally color shapes, and Part 4 will be about using PP in a relatively basic manner while integrating the result with the add-in.

For now, here is an image representing the result.

powerpivot

For those curious about the contest (now closed), here is a link: http://wildfireapp.com/website/6/contests/19877/voteable_entries/3165298

Listing Formulas

This code will list all the formulas in the selection in the Immediate Window.

Sub ListFormulas()
   
    Dim rCell As Range
   
    If TypeName(Selection) = "Range" Then
        For Each rCell In Selection.Cells
            Debug.Print String(4, " ") & rCell.Address(0, 0), rCell.Formula
        Next rCell
    End If
   
End Sub

If you have a range selected, it loops through each cell. Then it prints four spaces, the cell’s address ((0,0) means relative, e.g. A1 not $A$1), and the cell’s formula. Note that if a cell contains a value instead of a formula, the Formula property returns the value. I wonder why that is.

Why the four spaces? So I can paste into SuperUser and it thinks it’s code.

Drawing in Excel

Brad sent me a link to this cool YouTube video

http://www.youtube.com/watch?v=4YG_WWZYqUs

Libro1? Is that Italian Excel?

Then there’s this:

http://www.youtube.com/watch?v=8oR1_PA3yTw

And you thought 3D charts were useless.