<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Fuzzy Text Match</title>
	<atom:link href="http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/</link>
	<description>Daily posts of Excel tips…and other stuff</description>
	<lastBuildDate>Wed, 08 Feb 2012 23:58:05 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Nigel</title>
		<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/#comment-52644</link>
		<dc:creator>Nigel</dc:creator>
		<pubDate>Thu, 14 Oct 2010 19:10:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dailydoseofexcel.com/?p=635#comment-52644</guid>
		<description>&lt;p&gt;I&#039;ve posted a more rigorous treatment of word-matching and &#039;edit distance&#039; algorithms on Excellerando, with a full implementation of a Fuzzy &#039;VLookup&#039; function:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://excellerando.blogspot.com/2010/03/vlookup-with-fuzzy-matching-to-get.html&quot; rel=&quot;nofollow&quot;&gt;http://excellerando.blogspot.com/2010/03/vlookup-with-fuzzy-matching-to-get.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Wish I&#039;d seen this thread earlier: I did the work in June 2006.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I&#8217;ve posted a more rigorous treatment of word-matching and &#8216;edit distance&#8217; algorithms on Excellerando, with a full implementation of a Fuzzy &#8216;VLookup&#8217; function:</p>
<p><a href="http://excellerando.blogspot.com/2010/03/vlookup-with-fuzzy-matching-to-get.html" rel="nofollow">http://excellerando.blogspot.com/2010/03/vlookup-with-fuzzy-matching-to-get.html</a></p>
<p>Wish I&#8217;d seen this thread earlier: I did the work in June 2006.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brandon Joyce</title>
		<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/#comment-32900</link>
		<dc:creator>Brandon Joyce</dc:creator>
		<pubDate>Sat, 14 Jun 2008 15:43:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dailydoseofexcel.com/?p=635#comment-32900</guid>
		<description>&lt;p&gt;I&#039;ve built an advanced fuzzy deduping process for a database of customers using SQL Server Integration Services (SSIS).  If you must do this in excel, you should be able to use VBA to call the service.  Problem with this is that you will need the enterprise edition of SQL 2005.  Anyway, the solution was a huge success for me that was very easy to implement.  The fuzzy lookups use a token based system for scoring similarity between values.  Hope this helps!&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I&#8217;ve built an advanced fuzzy deduping process for a database of customers using SQL Server Integration Services (SSIS).  If you must do this in excel, you should be able to use VBA to call the service.  Problem with this is that you will need the enterprise edition of SQL 2005.  Anyway, the solution was a huge success for me that was very easy to implement.  The fuzzy lookups use a token based system for scoring similarity between values.  Hope this helps!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Herman Gorter</title>
		<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/#comment-29576</link>
		<dc:creator>Herman Gorter</dc:creator>
		<pubDate>Sun, 23 Dec 2007 17:14:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dailydoseofexcel.com/?p=635#comment-29576</guid>
		<description>&lt;p&gt;Google&#039;s secret:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.google.com/technology/pigeonrank.html&quot; rel=&quot;nofollow&quot;&gt;http://www.google.com/technology/pigeonrank.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Donny Miller:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.donnymiller.com/beautifulpeople/donny.htm&quot; rel=&quot;nofollow&quot;&gt;http://www.donnymiller.com/beautifulpeople/donny.htm&lt;/a&gt;&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Google&#8217;s secret:</p>
<p><a href="http://www.google.com/technology/pigeonrank.html" rel="nofollow">http://www.google.com/technology/pigeonrank.html</a></p>
<p>Donny Miller:</p>
<p><a href="http://www.donnymiller.com/beautifulpeople/donny.htm" rel="nofollow">http://www.donnymiller.com/beautifulpeople/donny.htm</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Herman Gorter</title>
		<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/#comment-29575</link>
		<dc:creator>Herman Gorter</dc:creator>
		<pubDate>Sun, 23 Dec 2007 16:55:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dailydoseofexcel.com/?p=635#comment-29575</guid>
		<description>&lt;p&gt;Shall I tell you what today&#039;s best text matching system is? Everybody uses it and yet nobody knows how it works? It&#039;s the way Google matches our strings with relevant web pages.&lt;br&gt;
If only Microsoft had a spark of this G-power, help facilities would become twice as user friendly, and Excel would probably have a few stunning search-and-match functions. &lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;&quot;&quot; rel=&quot;nofollow&quot;&gt;What is Google&#039;s secret?&lt;/a&gt; &lt;/p&gt;
&lt;p&gt;JMW: As &lt;a href=&quot;&quot;&quot; rel=&quot;nofollow&quot;&gt;Donny Miller &lt;/a&gt;said, &quot;You have to fool yourself before you can fool anybody else&quot;. One can do serious things only on condition one doesn&#039;t take them too seriously  that&#039;s why people keep coming to this blog.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Shall I tell you what today&#8217;s best text matching system is? Everybody uses it and yet nobody knows how it works? It&#8217;s the way Google matches our strings with relevant web pages.<br />
If only Microsoft had a spark of this G-power, help facilities would become twice as user friendly, and Excel would probably have a few stunning search-and-match functions. </p>
<p><a href=""" rel="nofollow">What is Google&#8217;s secret?</a> </p>
<p>JMW: As <a href=""" rel="nofollow">Donny Miller </a>said, &#8220;You have to fool yourself before you can fool anybody else&#8221;. One can do serious things only on condition one doesn&#8217;t take them too seriously  that&#8217;s why people keep coming to this blog.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MJW</title>
		<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/#comment-29241</link>
		<dc:creator>MJW</dc:creator>
		<pubDate>Thu, 06 Dec 2007 22:54:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dailydoseofexcel.com/?p=635#comment-29241</guid>
		<description>&lt;p&gt;Dick et Al:&lt;/p&gt;
&lt;p&gt;Anyone in need of a fuzzy match VBA/add-in solution likely is already aware of the commercial Fuzzy Finder add-in; if you&#039;d rather a free solution, there&#039;s a few options over on Mr.Excel&#039;s old site under &lt;a href=&quot;http://www.mrexcel.com/pc07.shtml&quot; rel=&quot;nofollow&quot;&gt;http://www.mrexcel.com/pc07.shtml&lt;/a&gt;.  Cheers!&lt;/p&gt;
&lt;p&gt;@ Herman:  It appears as though you were more interested in screaming &quot;I&#039;m a genius everyone!&quot; from the rooftops then you were in providing an actual solution.  It&#039;s an intriguing insight to note that you &quot;like unanswered questions&quot; (which isn&#039;t actually even accurate, you were trying to state that you like the challenge/entertainment of them), yet didn&#039;t provide a single substantial formula.  Any idiot could have trolled any one of the many Excel boards and posted that rubbish you prattled on with; a true fan of connundrums would have spoken concisely and technically without treating their response as a soapbox for empty-handed boasts.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Dick et Al:</p>
<p>Anyone in need of a fuzzy match VBA/add-in solution likely is already aware of the commercial Fuzzy Finder add-in; if you&#8217;d rather a free solution, there&#8217;s a few options over on Mr.Excel&#8217;s old site under <a href="http://www.mrexcel.com/pc07.shtml" rel="nofollow">http://www.mrexcel.com/pc07.shtml</a>.  Cheers!</p>
<p>@ Herman:  It appears as though you were more interested in screaming &#8220;I&#8217;m a genius everyone!&#8221; from the rooftops then you were in providing an actual solution.  It&#8217;s an intriguing insight to note that you &#8220;like unanswered questions&#8221; (which isn&#8217;t actually even accurate, you were trying to state that you like the challenge/entertainment of them), yet didn&#8217;t provide a single substantial formula.  Any idiot could have trolled any one of the many Excel boards and posted that rubbish you prattled on with; a true fan of connundrums would have spoken concisely and technically without treating their response as a soapbox for empty-handed boasts.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Herman Gorter</title>
		<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/#comment-26179</link>
		<dc:creator>Herman Gorter</dc:creator>
		<pubDate>Sat, 04 Aug 2007 14:58:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dailydoseofexcel.com/?p=635#comment-26179</guid>
		<description>&lt;p&gt;Frank, as I am a genius, and I like unanswered questions (this one is small fry, but still), I put a few phone numbers on a worksheet, like this:&lt;/p&gt;
&lt;p&gt;+1 275 561/32/58&lt;br&gt;
(0275) 555.32.58&lt;br&gt;
001-275 555-32-58&lt;br&gt;
275/555 3258&lt;br&gt;
555 32 58&lt;/p&gt;
&lt;p&gt;As you can see they are all really the same number, and none of them is longer than 17 characters (didn&#039;t you notice that? - all right, you&#039;re a genius or you&#039;re not).&lt;/p&gt;
&lt;p&gt;The first thing you have to remember when you work with Excel is: it&#039;s only a tool, so it&#039;s not going to be perfect. &lt;/p&gt;
&lt;p&gt;At this stage I have to tell you I will not even bother trying to solve this with VBA - that would be way too elaborate and as you rightly say, complex. &lt;/p&gt;
&lt;p&gt;The next step is to parse these numbers into one-character components. &lt;/p&gt;
&lt;p&gt;Although worksheets are Excel&#039;s greatest programming environment (don&#039;t worry, Dick knows I can explain this), text formulas have a severe limitation: they consider text strings first and foremost as values, rather than a set of characters. To look at text cells with array- (not X-ray) glasses, you need elaborate formulas - so we keep it simple and for each telephone number we allow an extra 17 cells to contain every single one of its characters.&lt;/p&gt;
&lt;p&gt;Talking of limitations in Excel, do you know why CONCATENATE is completely redundant? Because Excel  doesn&#039;t allow =CONCATENATE(A1:F1), which is really a shame. Can anyone think of a reason why one would not use the ampersand in concatenating two strings? As I said, keep it simple.&lt;/p&gt;
&lt;p&gt;Then we take out all the numeric characters using ISNUMBER (again, the simplest is to allow another 17 extra cells - there are millions left!), and we re-assemble the phone numbers (using the ampersand), but in reverse order, without their non-numeric characters:&lt;/p&gt;
&lt;p&gt;85231655721&lt;br&gt;
85235555720&lt;br&gt;
8523555572100&lt;br&gt;
8523555572&lt;br&gt;
8523555&lt;/p&gt;
&lt;p&gt;The last step is the one where you don&#039;t need to be a genius, so I&#039;m leaving all the &quot;fuzzyness&quot; of the text matching to you.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Frank, as I am a genius, and I like unanswered questions (this one is small fry, but still), I put a few phone numbers on a worksheet, like this:</p>
<p>+1 275 561/32/58<br />
(0275) 555.32.58<br />
001-275 555-32-58<br />
275/555 3258<br />
555 32 58</p>
<p>As you can see they are all really the same number, and none of them is longer than 17 characters (didn&#8217;t you notice that? &#8211; all right, you&#8217;re a genius or you&#8217;re not).</p>
<p>The first thing you have to remember when you work with Excel is: it&#8217;s only a tool, so it&#8217;s not going to be perfect. </p>
<p>At this stage I have to tell you I will not even bother trying to solve this with VBA &#8211; that would be way too elaborate and as you rightly say, complex. </p>
<p>The next step is to parse these numbers into one-character components. </p>
<p>Although worksheets are Excel&#8217;s greatest programming environment (don&#8217;t worry, Dick knows I can explain this), text formulas have a severe limitation: they consider text strings first and foremost as values, rather than a set of characters. To look at text cells with array- (not X-ray) glasses, you need elaborate formulas &#8211; so we keep it simple and for each telephone number we allow an extra 17 cells to contain every single one of its characters.</p>
<p>Talking of limitations in Excel, do you know why CONCATENATE is completely redundant? Because Excel  doesn&#8217;t allow =CONCATENATE(A1:F1), which is really a shame. Can anyone think of a reason why one would not use the ampersand in concatenating two strings? As I said, keep it simple.</p>
<p>Then we take out all the numeric characters using ISNUMBER (again, the simplest is to allow another 17 extra cells &#8211; there are millions left!), and we re-assemble the phone numbers (using the ampersand), but in reverse order, without their non-numeric characters:</p>
<p>85231655721<br />
85235555720<br />
8523555572100<br />
8523555572<br />
8523555</p>
<p>The last step is the one where you don&#8217;t need to be a genius, so I&#8217;m leaving all the &#8220;fuzzyness&#8221; of the text matching to you.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: frank</title>
		<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/#comment-26038</link>
		<dc:creator>frank</dc:creator>
		<pubDate>Sat, 28 Jul 2007 18:09:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dailydoseofexcel.com/?p=635#comment-26038</guid>
		<description>&lt;p&gt;I could make good use of a text matching facility in my daily work, but Dick&#039;s problem of finding duplicates in a list of fax numbers is challenging enough.&lt;/p&gt;
&lt;p&gt;Ideally, Excel should have a function that calculates the correlation between two strings. CORREL and PEARSON do exactly that for two equally-sized ranges of numbers, but my new function would have to have the ability to evaluate strings of different lengths.&lt;/p&gt;
&lt;p&gt;Unless you&#039;re a genius, this is a complex problem to solve in one go. How could we break up this this problem into parts? I think this one requires basically three steps:&lt;/p&gt;
&lt;p&gt;1) &quot;cleaning&quot; the data&lt;br&gt;
2) defining the algorithm that will decide whether any string matches another one&lt;br&gt;
3) finding a setup in which to carry out the actual matching between the given strings&lt;/p&gt;
&lt;p&gt;More in detail:&lt;/p&gt;
&lt;p&gt;1) &quot;Garbage&quot; in a  telephone number string is in fact every non-numeric character. People use all kinds of characters to separate the number components, so if we could extract these from our strings, we would improve comparability.&lt;/p&gt;
&lt;p&gt;2) After the cleaning proces we would actually end up with numeric strings: numbers of varying length, often depending on whether a prefix is included or not. Phone numbers usually don&#039;t have suffixes, so probably the best way to search for duplicates is by starting from the back.&lt;/p&gt;
&lt;p&gt;3) Maybe the most difficult exercise is to apply the matching procedure itself. Since every string has to be compared to every other string, a lot of comparisons need to be made: if n is the number of strings, this would be n times n(n-1): any number with every other number in the list, n times, i.e. n²(n-1). &lt;/p&gt;
&lt;p&gt;Ideas for a solution?&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I could make good use of a text matching facility in my daily work, but Dick&#8217;s problem of finding duplicates in a list of fax numbers is challenging enough.</p>
<p>Ideally, Excel should have a function that calculates the correlation between two strings. CORREL and PEARSON do exactly that for two equally-sized ranges of numbers, but my new function would have to have the ability to evaluate strings of different lengths.</p>
<p>Unless you&#8217;re a genius, this is a complex problem to solve in one go. How could we break up this this problem into parts? I think this one requires basically three steps:</p>
<p>1) &#8220;cleaning&#8221; the data<br />
2) defining the algorithm that will decide whether any string matches another one<br />
3) finding a setup in which to carry out the actual matching between the given strings</p>
<p>More in detail:</p>
<p>1) &#8220;Garbage&#8221; in a  telephone number string is in fact every non-numeric character. People use all kinds of characters to separate the number components, so if we could extract these from our strings, we would improve comparability.</p>
<p>2) After the cleaning proces we would actually end up with numeric strings: numbers of varying length, often depending on whether a prefix is included or not. Phone numbers usually don&#8217;t have suffixes, so probably the best way to search for duplicates is by starting from the back.</p>
<p>3) Maybe the most difficult exercise is to apply the matching procedure itself. Since every string has to be compared to every other string, a lot of comparisons need to be made: if n is the number of strings, this would be n times n(n-1): any number with every other number in the list, n times, i.e. n²(n-1). </p>
<p>Ideas for a solution?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Klaas Bil</title>
		<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/#comment-19401</link>
		<dc:creator>Klaas Bil</dc:creator>
		<pubDate>Wed, 05 Apr 2006 00:01:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dailydoseofexcel.com/?p=635#comment-19401</guid>
		<description>&lt;p&gt;I hit this page when searching for other fuzzy match implementations for Excel. I have recently developed a fuzzy search capability for Excel. One enters a search string and then a list of ranked best matches is brought up, with match percentages. It works quite well but at this point I&#039;m not willing to share as I&#039;m not sure how much this is sought for. Several people have suggested I could make money with this. Email me for more info.&lt;/p&gt;
&lt;p&gt;Klaas Bil (Netherlands)&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I hit this page when searching for other fuzzy match implementations for Excel. I have recently developed a fuzzy search capability for Excel. One enters a search string and then a list of ranked best matches is brought up, with match percentages. It works quite well but at this point I&#8217;m not willing to share as I&#8217;m not sure how much this is sought for. Several people have suggested I could make money with this. Email me for more info.</p>
<p>Klaas Bil (Netherlands)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ross</title>
		<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/#comment-3067</link>
		<dc:creator>ross</dc:creator>
		<pubDate>Sun, 26 Dec 2004 19:03:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dailydoseofexcel.com/?p=635#comment-3067</guid>
		<description>&lt;p&gt;might this help?&lt;br&gt;
&lt;a href=&quot;http://www.ablebits.com/excel-duplicates-find-remove-addins/index.php&quot; rel=&quot;nofollow&quot;&gt;http://www.ablebits.com/excel-duplicates-find-remove-addins/index.php&lt;/a&gt;&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>might this help?<br />
<a href="http://www.ablebits.com/excel-duplicates-find-remove-addins/index.php" rel="nofollow">http://www.ablebits.com/excel-duplicates-find-remove-addins/index.php</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ross</title>
		<link>http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/#comment-1772</link>
		<dc:creator>ross</dc:creator>
		<pubDate>Fri, 08 Oct 2004 17:21:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dailydoseofexcel.com/?p=635#comment-1772</guid>
		<description>&lt;p&gt;could use range.find?&lt;br&gt;
or then mayber you need to use something like this&lt;br&gt;
&lt;a href=&quot;http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK5/NODE203.HTM&quot; rel=&quot;nofollow&quot;&gt;http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK5/NODE203.HTM&lt;/a&gt;&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>could use range.find?<br />
or then mayber you need to use something like this<br />
<a href="http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK5/NODE203.HTM" rel="nofollow">http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK5/NODE203.HTM</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>

