Archive for the ‘String Functions’ Category.

Fixed Width Text Files

Recently, I had to submit a fixed width text file to a state tax agency in lieu of sending them physical copies of W-2 forms. Fixed width files contain a number of columns and each column starts at a particular position on the line. If the data in that column is shorter than the width of the column, the column is padded with spaces so that the next column starts in the correct position.

I had the data in Excel, I just needed to get it to a text file. As it happens, I only have one employee for this state, so it would probably have been just as quick to type it manually. But I think we all know why I didn’t do that. The spec for the file is this:

Field Start Length
Full Name 1 50
SSN 51 9
Wages 60 8
Withholding 69 8

Here’s the monster formula in F1:

=A1&REPT(" ",A2-LEN(A1))&SUBSTITUTE(B1,"-","")&REPT(" ",B2-LEN(SUBSTITUTE(B1,"-","")))&C1&
   REPT(" ",C2-LEN(C1))&D1&REPT(" ",D2-LEN(D1))

The basic structure is cell value & repeat a space to fill. The exception is B2 where I had to remove the dashes from the social security number. If I had more than one row to do, I probably would have put the field lengths above the data. Then I could fill down column F. Finally I would copy F1 to a blank Notepad file, and there’s my fixed-width text file.

Splitting the Check

A couple of weeks ago I spent some time in the bible belt attending football games. Next week I’ll be in Bush country (Austin, TX) for yet another football road trip. One of the more exciting aspects of these road trips is settling up on expenses when you get back. Depending on the group, this can take between six days and six months. Here’s a spreadsheet I use to do it:

The formula in E3:I11 is

=((LEN($D3)-LEN(SUBSTITUTE($D3,E$2&";","")))/LEN($D3)*$B3)-IF($C3=E$2,$B3,0)

The first part determines how many times a certain person’s initials appear in the ‘Paid For’ section and takes a proportionate share of the money. The second part subtracts the total expense if this person is the person who paid the bill. For it to work, you have to use the same number of characters for every person, which is why initials works so well. And don’t forget to include that pesky semicolon, particularly after the last guy in the list.

The Hotel1 expense demonstrates some of the flexibility. Note that BD stayed in the room two nights, while TO and FS stayed all three nights. By repeating their initials, each pays his share proportionate to the number of nights he stayed.

I’m sure you could get rid of that $1 rounding error, but it’s not worth the effort. Of course the initials, expenses, and amounts have been changed to protect the guilty, but don’t go thinking that those initials are random.

Double Click to Exclude Numbers

I have a table of numbers and formulas for each row, column, and for the table as a whole. The table is part of a report - the output of the application. The user wants to exclude certain numbers from the calculations after reviewing them. These numbers would be outliers and would skew the results. The calculations are AVERAGE and STDEV functions. If a number is excluded, it needs to still be shown on the reports, but with a strikethrough format.

The obvious course is to modify the formula when the user has identified a cell to exclude. With formulas for every row, column, and for the whole table, that's a pretty big job. An easier way is to change the numbers to text. Both AVERAGE and STDEV ignore text, so this would have the effect of excluding the numbers from the formulas without having to change the formulas. I started with something like this:

With Target
    If .Font.Strikethrough Then
        .Value = CDbl(.Value)
        .Font.Strikethrough = False
    Else
        .Value = "'" & .Value
        .Font.Strikethrough = True
    End If
End With

This is in the worksheet's BeforeDoubleClick event. I use the strikethrough property to determine if the number has already been excluded. The user can double click the number to toggle between inclusion and exclusion. Excluded numbers have an apostrophe put in front of them (making them text) and the font is changed to strikethrough. Included numbers are changed back to a Double (using CDbl) and the strikethrough is removed.

Incidentally, not every number can be excluded. I've applied a particular style to those numbers that can be excluded and I limit the event like this:

If Target.Style.Name = "TBData2" Then

A new wrinkle appeared. Now some of the numbers are actually formulas. That complicates the above code snippet a little.

excel range

With Target
    If .HasFormula Then
        lStart = 2
    Else
        lStart = 1
    End If
   
    If .Font.Strikethrough Then
        .Formula = "=" & Mid(.Formula, Len("=TEXT()"), _
            Len(.Formula) - Len("=TEXT()'',") - Len(.NumberFormat))
        .Font.Strikethrough = False
    Else
        .Formula = "=TEXT(" & Mid(.Formula, lStart, Len(.Formula)) & _
            ",""" & .NumberFormat & """)"
        .Font.Strikethrough = True
    End If
End With

Instead of putting an apostrophe in front of the value to make it text, I surround it with the TEXT function. This has the added benefit of keeping the same number format applied to the text as was applied to the number. When a number is excluded (the Else part), I start with "=TEXT(". Then I repeat the existing formula, removing the equal sign if there was one (Mid(.Formula, lStart, Len(.Formula))). The suffix to this string manipulation is the existing NumberFormat surrounded by double quotes.

When a number is included, the TEXT portion of the formula is removed. The Mid function starts at Len("=TEXT()"), which is a verbose way of saying 7. The length of Mid is the length of the formula, minus the length the text function (including parentheses, the comma that separates the number format argument, and the quotes that surround the number format), minus the length of the numberformat.

This has the strange side effect of converting a number like 3 into a formula like =3 when it's toggled. I can't think of any ill effects of that, but there may be.

Min and Max across sheets

Bob has a workbook with several web queries in it. In each of the web queries, the first column is a "date". He needs to find the minimum and maximum date for all web queries.

The first problem this presents is that the "dates" aren't really dates; they're text. That's why I kept putting quotes around "date". I'll stop now. He could format the cells all he wants, but Excel won't coerce them into dates. Even if he fixed the text to be real dates, when the web queries refresh, they'll be right back to text. The dates look like this:

Dec 12, 2006, 5:00 pm

That seems like a great candidate for the DATEVALUE function, but sadly it doesn't work. I needed to manipulate the text a little. I used this formula:

=DATEVALUE(LEFT(A4,FIND(",",A4,FIND(",",A4,1)+1)-1))+TIMEVALUE(MID(A4,FIND(",",A4,FIND(",",A4,1)+1)+2,255))

The DATEVALUE function converts the first part into a date, then adds it to the TIMEVALUE function, which converts the second part. The parts are divided by a comma. The second comma to be precise. The two FIND functions find the location of the second comma. The inner FIND sets the start for the outer FIND: One space to the right of the first comma found.

I put this formula in a column adjacent to the web query. I also checked the "Fill down formulas in columns adjacent to data" option in the Properties dialog box associated with the web query. This ensures that as the query grows, the formulas will fill down the right amount.

The second problem was using MIN and MAX across the sheets. The problem here was that some of the sheet names contain spaces. I knew that I needed some quotes around the sheet name, but I can never remember where. I ended up with this:

=MIN('Abused News:Wipeout'!$F$4:$F$100)

The single quotes go around both sheet names (not individually) and before the bang. It took me a few iterations to figure that one out. If the queries ever get larger than 100 rows, I'm in trouble.

Find Position of First Capital Letter in a String

There has to be a better way than this, but I can't think of it. I have a string with a capital letter somewhere in it and I need to determine the position of that letter. The goal is to strip out all the letters before it. The test subject is the string mdtTxnDate. The array formula that seems to work is:

=MATCH(1,(CODE(MID(J13,ROW($A$1:$A$255),1))< =90)*(CODE(MID(J13,ROW($A$1:$A$255),1))>=65),FALSE)

It assumes that the capital letter won't be more than 255 characters in, which is true in my case. This is an array formula, so it needs to be entered with Control+Shift+Enter, not just enter. Here's the break down:

CODE(MID(J13,ROW($A$1:$A$255),1)) returns an array of ASCII characters that make up the string. Although it returns a 255 element array, I'm only going to show 10 elements because that's how long my string is. For my test subject, this portion of the formula will return

{109;100;116;84;120;110;68;97;116;101}

Those are the ASCII codes for mdtTxnDate.

CODE(MID(J13,ROW($A$1:$A$255),1))< =90 returns an array of TRUEs and FALSEs based on whether or not the ASCII code is less than or equal to 90. Ninety is the ASCII code for capital Z. This will return:

{FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE}

I can see that the fourth letter has an ASCII code less than 90. There is a similar section of the formula that does the same test except that it checks for ASCII codes greater than or equal to 65 (capital A). It returns a similar array which is then multiplied by the first array. In Excel formulas, FALSE is equivalent to zero and TRUE is equivalent to one. When you multiply these two arrays, you get an array with ones and zeros. The ones mean that there was a TRUE in that same spot in both arrays. If there had been a FALSE in either array, it would have returned zero. The resulting array looks like this:

{0;0;0;1;0;0;1;0;0;0}

It looks like positions four and seven are my capital letters. Now I use the MATCH function to find the first 1 in the array and the formula returns 4.

excel range showing formula

Now I know you guys can come up with something better and I also know you'll want to share it. If you do, please be sure that you escape your greater than and less than signs in the comments. If you don't, the internets will interpret them as html and your super-great formula will be lost forever.

Summing Comma Separated Values in a Cell

Created by David Hager

To add comma delimited values in a cell (such as 1,2,11,4 in cell A1) to
the right of the cell containing the string, highlight cell B1 (for this
case) and create the following defined name formula (called "csum"):

=EVALUATE(SUBSTITUTE(A1,",","+"))

Then, type =csum in B1 to obtain the result (18, in this case).

Defined name dialog for csum

Range showing defined name, csum, in formula