Copying HTML Tables over Merged Cells

Posted on February 10, 2011 by Dick Kusleika

I ran into some strangeness while copying tables from the web to Excel. To demonstrate, I’ll use Debra’s sample data table. I run this code

Sub GetTable()

Dim ieApp As Object
Dim ieDoc As Object
Dim Clip As DataObject

Set ieApp = CreateObject(“InternetExplorer.Application”)

With ieApp
.navigate “http://www.contextures.com/xlsampledata01.html”
Do While .Busy: DoEvents: Loop
Do While .readyState <> 4: DoEvents: Loop

Set ieDoc = .document
Sheet1.UsedRange.ClearContents

Set Clip = New DataObject
Clip.SetText “<html>” & ieDoc.all.tags(“TABLE”).Item(3).outerhtml & “</html>”
Clip.PutInClipboard
Sheet1.Range(“A1”).Select
Sheet1.PasteSpecial “Unicode Text”

.Quit
End With

End Sub

That code automates Internet Explorer, navigates to the page, gets the table, puts it in the clipboard, and pastes it to a cell.

Nice. I delete the data and merge A1:C1

I run the code again and get this

Not so nice. I fixed it by adding this line to the code

Sheet1.UsedRange.UnMerge

That was easy enough. The problem was that I didn’t know it was merged cells causing the problem. I had some tables coming in fine and other were truncated at seemingly random spots. Once I discovered that some of the cells were merged, I deleted the rows and it ran fine. Then later, it would screw up again. Finally I figured out that the HTML table had some colspan attributes in some of the td tags. The colspan attribute, when pasted into Excel, creates merged cells. Now you know.

And if you’re wondering, the tables in question were in a website that required a log in, so I couldn’t use a web query. By automating IE, I could navigate to the login page, fill in the form, then navigate to the page with the tables I wanted. I’ll post about that as soon as I work out a few bugs.

10 thoughts on “Copying HTML Tables over Merged Cells”

Tweets that mention Daily Dose of Excel » Blog Archive » Copying HTML Tables over Merged Cells -- Topsy.com says:

February 10, 2011 at 2:21 pm

[…] This post was mentioned on Twitter by gorocube, Excel MVP. Excel MVP said: Copying HTML Tables over Merged Cells http://goo.gl/fb/pVJHp […]
William Barthel says:

February 10, 2011 at 8:34 pm

The worksheet.pastespecial method has the format parameter “Unicode Text”. What are all of the possible options for that argument? Help doesn’t provide an enum list.
William Barthel says:

February 10, 2011 at 8:47 pm

From http://msdn.microsoft.com/en-us/library/microsoft.office.tools.excel.worksheet.pastespecial.aspx

Format
Type: System.Object
A string that specifies the Clipboard format of the data. Common format strings include “HTML”, “Bitmap”, “Hyperlink”, “Unicode Text”, and “Text”.
Yard says:

February 11, 2011 at 2:27 am

Maybe you could run this sub before you bring the table in:

Sub Prepare()
Dim rng As Range
For Each rng In Sheet1.UsedRange
If rng.MergeCells Then
MsgBox “You have merged cells. Please slap yourself.”, vbExclamation
rng.UnMerge
End If
Next rng

End Sub
Steve Sexton says:

February 11, 2011 at 6:45 am

Oh, wow. Everyday it seems I find something new and useful Excel can do.

Is there a simple way to grab only a section of a table from a web page?

For example, on ‘http://finance.yahoo.com/q/pm?s=NAESX+Performance’ I would only like to grab the ‘Past Quarterly Returns (%) for NAESX’ section of the table.

By the way, I love your Disconnected Recordset post. Thanks for sharing your knowledge.
hans schraven says:

February 11, 2011 at 10:51 am

you might consider:
reference Microsoft XML version2.0

Sub faster()
With New XMLHTTP
.Open “Get”, “http://www.contextures.com/xlsampledata01.html”, False
.send
Do While .readyState <> 4
DoEvents
Loop

Set Clip = New DataObject
Clip.SetText “<html><table “ & Split(Split(.responseText, “<table “)(4), “/table>”)(0) & “/table></html>”
Clip.PutInClipboard

[K1].PasteSpecial
End With
End Sub
venkat says:

February 11, 2011 at 9:11 pm

this is extrmely useful. This is what I was trying to wrie for the paste year. thank you very much

if there are more than one table how to select the particular table.

take this url
http://in.finance.yahoo.com/q?s=CCCL.NS&ql=0

there are many tables. when i use this url in your code in the .navigate statement as (the full address of the url)

.navigate “http://in.finance.yahoo.com/q?s=CCCL.NS&ql=0?

and run the macro. I get the table “comparison”. suppose I want to get only the table “key statistics
how to modify the above statement.
Randy Harmelink says:

February 12, 2011 at 4:47 pm

Those of you interested in extraction of particular items from web pages may be interested in my free, open-source add-in:

http://finance.groups.yahoo.com/group/smf_addin (or just click on the link on my name)

The add-in is basically a collection of user-defined functions to extract data out of the source code of web pages. It uses the XMLHTTP process to grab the data and parse the page based on parameters passed to the function.

For example, to get the Market Capitalization of ticker symbol “MMM” from Yahoo’s Key Statistics page:

=RCHGetTableCell(“http://finance.yahoo.com/q/ks?s=MMM”,1,”Market Cap (intraday)”)

Documentation on the functions is in the “Files” area of the Yahoo group. And the “Links” area of the Yahoo group contain links to tips and FAQs.
venkat says:

February 12, 2011 at 10:41 pm

thank you for the explanation. I have used smf addin for yahoo finance and used it for Indian stocks. It works seemlessly. Is there any restriction in the number of symbols. You have said highlight b2:M100. does that mean the restriction of 100 symbols is there?

my second comment is
quote
Those of you interested in extraction of particular items from web pages may be interested in my free, open-source add-in:

http://finance.groups.yahoo.com/group/smf_addin (or just click on the link on my name)
unquote

I am interested in downloading tables from any webpage not only yahoo finance.
can I use the above html replacing the webpage?

thanks for having patience and trying to educate us.
Randy Harmelink says:

February 12, 2011 at 11:02 pm

>>> [Does the add-in] any restriction in the number of symbols.

It has a limit of 1000 web pages at a time, because that’s the size of the array I use to store web pages from which to extract data. I’d prefer people NOT use it to download a ton of data from sites — such as to load a database — because I don’t want the free data providers to be overburdened. It wouldn’t be that difficult for them to make the pages inaccessible.

However, the RCHGetYahooQuotes() function can grab numerous data items on up to 200 ticker symbols, because it uses a CSV file from Yahoo for that purpose. So this function is independent of the 1000-web page limit mentioned above, because it’s not extracting the data from a web page. The CSV file from Yahoo is great, because all of the data can be retrieved with a single Internet access. Most of the other sources would require at least one page per ticker symbol.

Both Zacks and FinViz also have multiple fields and multiple company download files that can be collected with a single Internet access, using the export options of their screening process.

>>> I am interested in downloading tables from any webpage not only yahoo finance.
can I use the above html replacing the webpage?

Different function. To download the entire table containing the data item from my previous example:

=RCHGetHTMLTable(“http://finance.yahoo.com/q/ks?s=MMM”,”Market Cap (intraday)”,-1,””,1)

One disadvantage of using this versus getting the individual table cells is that retrieval of the entire table has to be done with an array-entered formula over a worksheet range. That means the retrieved data cannot be sorted.