Creating Wildcards
I need to save the names of one or more files in the custom document properties of a workbook. I don't want to create a separate property for every filename. I thought about saving a comma-delimited string, then parsing it out. It would look like:
file1name,file2name
Another idea I had was just save one string that could stand for both:
file?name
I need to convert a number of strings into one string with the appropriate wildcards. I came up with the code below, but it has some shortcomings. It replaces differing characters in the same position with a question mark, and converts strings of three question marks or more into an asterisk. That means for wildcards("consistent","inconsistent"), it returns *s* when I would prefer it return *consistent.
Any suggestions on making it better?
Dim i As Long, j As Long
Dim sShort As String, sLong As String
Dim sTemp As String
Const sQUES As String = "?"
Const sASTR As String = "*"
'If only one string, then return it
If LBound(vaText) = UBound(vaText) Then
Wildcards = vaText(LBound(vaText))
Else
sShort = vaText(LBound(vaText))
'Store the longest and shortest strings
For i = LBound(vaText) To UBound(vaText)
If Len(vaText(i)) <Len(sShort) Then
sShort = vaText(i)
End If
If Len(vaText(i))> Len(sLong) Then
sLong = vaText(i)
End If
Next i
sTemp = sShort
'replace differing chars with ?
For i = LBound(vaText) To UBound(vaText)
If vaText(i) <> sShort Then
For j = 1 To Len(sShort)
If Mid(vaText(i), j, 1) <> Mid(sTemp, j, 1) Then
sTemp = Left(sTemp, j - 1) & sQUES & Mid(sTemp, j + 1, Len(sTemp))
End If
Next j
End If
Next i
'pad ?s to the end of the longest string
sTemp = sTemp & String(Len(sLong) - Len(sShort), sQUES)
'replace three or more ?s with a *
If Len(sLong)>= 3 Then
For i = Len(sLong) To 3 Step -1
sTemp = Replace(sTemp, String(i, sQUES), sASTR)
Next i
End If
Wildcards = sTemp
End If
End Function
Rob van Gelder:
This is similar to the ACM competition:
14 January 2007, 11:21 pmProblem Set 2005 - Abbreviations
Tushar Mehta:
How do you propose to restore the names of the original files from the stored abbreviated tokens? For example, how will you know that file?name maps to file1name and file2name but must not include file3name or fileAname?
Obviously, I don't know why you need to store the results in a custom property but if it isn't too out of the way...and it's higly likely you have already thought of this but in the remote chance that you have not...consider a hidden worksheet.
15 January 2007, 12:29 pmdoco:
This is a bit off topic, but viewing the code on the last few posts and the fact that I am in the middle of a new project, made me curious to know:
Has MS changed the infuriating process ( for Office 2007) of always returning to to a userform object after closing a routine in the VBE?
You know, while debugging or stepping through code and have to interrupt runtime to correct or add additional coding; then BAM! Back to a userform and then have to go find your code (function, sub or ???). I am hoping MS has fixed this intensely aggravating pain in the a**?
15 January 2007, 1:05 pmJohn Walkenbach:
doco: Unfortunately, the VBE window is Excel 2007 is exactly the same as the VBE window in the previous version. All of the annoying quirks and PITA bugs are still present. I still get those inexplicable "Device I/O error" messages, and I still see closed workbooks in the Project windows.
15 January 2007, 4:52 pmRob van Gelder:
I've seen the "Closed Workbooks in Projects window" problem before.
I cured it by removing a misbehaving Addin from the registry location:
HKEY_CURRENT_USER\Software\Microsoft\Office\Excel\Addins
(might pay to check HKEY_LOCAL_MACHINE too)
To anyone wanting to fool around with the registry, make a backup first.
15 January 2007, 6:55 pmDick Kusleika:
Tushar: The code produces a number of unsaved workbooks - between 2 and 100 (conceivably, but not really, more like 20). A number of those workbooks are applicable to all of the others because they contain statistical information. For instance, if 10 workbooks are created, maybe two of those workbooks contain "standards" against which the other eight workbooks would be compared for reasonableness. The workbooks are created but not saved. The user can close them without saving, or save some of them, or save them in any order. Basically, I don't know the name of the "standards" workbook when WorkbookX is saved. WorkbookX has a hyperlink to open the standards workbook(s). The user can't name the file whatever he wants, so I can guess the filename if it is eventually saved. The FollowHyperlink event uses FileSearch to find the files and open them.
The file to be opened via the hyperlink contains its own CDP that uniquely ties it to the hyperlink-containing file, so I'll never open fileAname unless it's applicable, although fileAname may be in the corpus of files I check for that CDP. The folder I'm checking contains thousands of files, so narrowing down the search is important.
I don't know why I don't use a hidden worksheet. I almost never use a hidden worksheet to store a list of something, but I can't think of a reason why - other than the reason that I'm "wasting" a whole worksheet when in 98% of the cases I'm only storing one file name.
This all started as a feature that checked every file in the folder for the magic CDP. That worked great when there was several dozen files. Now it takes nearly a minute - too long. Thanks for the comment.
15 January 2007, 11:59 pmMatt Vidas:
Dick,
Though this can probably be optimized a little bit, it checks every substring of the shortest string rather than character-by-character.
Dim i As Long, j As Long, k As Long
Dim sShort As String, sLong As String, sLong2 As String, sLong3 As String
Dim sTemp As String
Const sQUES As String = "?"
Const sASTR As String = "*"
'If only one string, then return it
If LBound(vaText) = UBound(vaText) Then
Wildcards = vaText(LBound(vaText))
Else
sShort = vaText(LBound(vaText))
'Store the longest and shortest strings
For i = LBound(vaText) To UBound(vaText)
If Len(vaText(i)) <Len(sShort) Then
sShort = vaText(i)
End If
If Len(vaText(i))> Len(sLong) Then
sLong = vaText(i)
End If
Next i
sLong2 = sLong
'find longest string in common to all parameters
For i = LBound(vaText) To UBound(vaText)
If vaText(i) <> sShort Then
sLong3 = ""
For j = 1 To Len(sShort)
sTemp = ""
For k = 1 To Len(sShort) - j + 1
If UCase(vaText(i)) Like "*" & UCase(Mid(sShort, j, k)) & "*" Then
sTemp = Mid(sShort, j, k)
If Len(sLong3) <k Then
sLong3 = sTemp
End If
End If
Next
Next j
If sTemp = "" Then
If Not UCase(vaText(i)) Like "*" & UCase(sLong3) & "*" Then
sLong2 = ""
Exit For
End If
End If
If Len(sLong3) <Len(sLong2) Then
sLong2 = sLong3
End If
End If
Next i
'pad ?s where longest matching substring does not match longest string
i = InStr(1, sLong, sLong2, vbTextCompare)
If i = 0 Then
sTemp = String(Len(sLong2), "?")
Else
sTemp = String(i - 1, "?") & sLong2 & String(Len(sLong) - _
Len(sLong2) - (i - 1), "?")
End If
'replace three or more ?s with a *
If Len(sLong)>= 3 Then
For i = Len(sLong) To 3 Step -1
sTemp = Replace(sTemp, String(i, sQUES), sASTR)
Next i
End If
Wildcards = sTemp
End If
End Function
I wasn't sure what the code/plaintext tags were, feel free to format
16 January 2007, 10:53 amMatt
Matt Vidas:
It seems to have cut out a chunk in the middle of my posting.. sorry
If Len(sLong3) = 3 Then 'same as above
16 January 2007, 10:56 amMatt Vidas:
Dick,
Please delete my previous 2 postings if possible, I can't seem to be able to post the full code so I will just upload it to my site.
Though this can probably be optimized a little bit, it checks every substring of the shortest string rather than character-by-character.
http://www.hastalavidas.com/DicksBlogWildcardFunction.txt
Matt
16 January 2007, 11:06 amJon Peltier:
Dick -
"I don't know why I don't use a hidden worksheet. I almost never use a hidden worksheet to store a list of something, but I can't think of a reason why - other than the reason that I'm "wasting" a whole worksheet when in 98% of the cases I'm only storing one file name."
Hidden worksheets don't really take up much of your resources, so you're not "wasting" anything. But if you have only one or two items to store, you could stick them into defined names.
16 January 2007, 11:28 amDick Kusleika:
Matt:
16 January 2007, 12:12 pm?wildcards("me","mame","mememe")doesn't work.Matt Vidas:
Dick,
I've updated my function, no longer just comparing to the longest string: http://www.hastalavidas.com/DicksBlogWildcardFunction2.txt
Is it just [ c o d e ] and [ / c o d e ] ?
Matt
16 January 2007, 12:37 pmDick Kusleika:
Matt: Thanks for the function, I'll be checking it out. To post code, but it in [ vb ] [ /vb ] tags without the spaces.
16 January 2007, 4:33 pmKeith Johnson:
Rob et al:
Off topic, but on the "Closed Workbooks in Projects window" issue. An attempt or two to resolve this (in March 2006) concluded that "Phantom Workbooks" were created by other than a misbehaving addin. For example, if I remember correctly, I could reliably create one by programmatically opening a workbook (Set Wkb = Workbooks.Open(Path)), executing a dummy routine in it (via Application.Run), closing the workbook (Call Wkb.Close) and releasing its object variable (Set Wkb = Nothing). The workbook then remained as a phantom.
They're mostly annoying. Associated objects like ThisWorkbook can't be selected in the VBE. The "out of memory" messages are, at least, innocuous. What I most objected to was that, although you *could* exceute code in phantom modules, attempting to *remove* them crashed Excel
I'm relatively convinced that Application.Run is creating a reference within Excel that is not cleared, and I couldn't then and can't now see how to clear it. There likely are other ways to do the same thing. But eventually, we gave up. Restarting Excel clears things up, and that had to be a good enough "solution." (Googling "phantom workbook" in the MS excel programming newsgroup should lead anyone to this interchange.)
Keith
16 January 2007, 4:34 pmRob van Gelder:
Keith: I tried to reproduce the problem using the steps you provided. It didn't reproduce the problem for me.
I'm positive that the COM add-in we use was causing the problem.
The non registry way to solve is as follows:
You need to get the "COM Add-Ins..." button onto your commandbar.
- right click the commandbar, select Customize...
- click the commands tab. From categories, choose Tools. From Commands, choose COM Add-Ins.
- drag COM Add-Ins button to your commandbar. Close the Customize windows. Click the COM Add-Ins button.
-or-
To open COM Add-Ins window directly, execute this VBA:
Application.CommandBars.FindControl(Id:=3754).Execute
Then manage your Add-Ins by either unticking them, or removing them.
Rob
16 January 2007, 6:07 pmDick Kusleika:
?wildcards("QC123pg123","QC123lp123")QC123*
should return
QC123??123I like the concept though. I'll try to tweak it.
16 January 2007, 8:59 pmthe Okk:
Dick Kusleika: Are you sure? Is QC123*p*123 mask is not better?
17 January 2007, 1:21 amMichael:
Matt -
Having just researched this, the tags are:
[ v b ] and [ / v b ]
all closed together. See the "code in comments" thread.
...Michael
17 January 2007, 7:37 amfzz:
QC123*p*123 may look better to humans, but QC123??123 would be much more efficient for computers to process. See any text on regular expressions. So it depends on who or what would be the intended user of these patterns.
Any soundex or approximate matching algorithm could be adapted for this. They could restrict pattern generation to the two strings with the poorest match, and they could be canibalized to locate common substrings.
Note that generating a pattern for pairs is much easier than generating patterns for 3 or more strings. For example, {abc,bca} -> ?bc?, but {abc,bca,cab} -> *a*, *b*, *c* all equally applicable and inefficient. And if there are no substrings in common, e.g., {abc,defg,hijkl}, what should the return pattern be, just * or ???* ? The latter indicates at least 3 characters. Again a question of whether the generated patterns are intended for human readers or computer processes. If the former, * may be better, but ???* would be much more useful in the latter case.
17 January 2007, 11:38 amthe Okk:
fzz: I'm talking about situations like {file123A2name,file9A4name}. Where "A" is an important and separates {file123A2name,file9A4name} from {file123B2name,file9B4name}.
17 January 2007, 10:36 pmOf course QC123??123 will be more efficient for computer... but I think that that mask isn't complete.
fzz:
Okk: In terms of inferring a pattern, there's a big difference between {"QC123pg123","QC123lp123"} and {"file123A2name","file9A4name"}. The first requires QC123*p*123 while the latter could be rendered as file?*A?name, where * and ?* take on their COUNTIF/SUMIF meanings: * means ZERO or more characters while ?* means ONE or more characters. Much easier to program detection of one or more characters. That said, there comes a point where only humans could detect patterns.
23 January 2007, 4:55 pmbrettdj:
Rob,
I recently fixed my VBE phantom workbook issue by culling the Addins from the registry.
Cheers
Dave
1 February 2007, 9:00 pm