HTML-код, полученный с помощью HttpGet, не возвращает полную страницу

Вопрос:

При попытке вытащить весь код HTML с этого сайта http://www.gasbuddy.com/GB_Price_List.aspx Я столкнулся с проблемой, что только половина страницы помещается в мою строку.

Я пробовал несколько методов, найденных во всех SO и других источниках поиска Google, и никто не работал над тем, чтобы решить мою проблему.

Это мой код, который извлекает страницу:

private class InternetGasBuddyConnection extends AsyncTask<String, String, String> {

protected String doInBackground(String... urls) {
StringBuilder response = new StringBuilder(30000);
DefaultHttpClient client = new DefaultHttpClient();
HttpGet httpGet = new HttpGet(URL);
String result = "";
try {
HttpResponse execute = client.execute(httpGet);
InputStream content = execute.getEntity().getContent();

BufferedReader buffer = new BufferedReader(new InputStreamReader(content));
String s = "";
while ((s = buffer.readLine()) != null)
response.append(s);

Log.d("before changing and parsing", response.toString());

Document doc = Jsoup.parse(response.toString(), URL);

result = response.toString();
Log.d("no parsing", result.toString());

result = doc.toString();


Log.d("after parsing", result);

} catch (Exception e) {

Log.e("Darrell", result, e);
e.printStackTrace();
}
return result;
}
@Override
public void onPostExecute(final String result) {
Log.d("onPostExcecute()", result);
htmlDoc = result;
}
}

когда я вызываю Log.d("after parsing", result); код, он отображает это в моем логарифме:

07-23 13:19:57.833: D/after parsing(32136): <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
07-23 13:19:57.833: D/after parsing(32136): <html xmlns="http://www.w3.org/1999/xhtml">
07-23 13:19:57.833: D/after parsing(32136):  <head>
07-23 13:19:57.833: D/after parsing(32136):   <title>USA and Canada Current Average Gas Prices By City/State/Province - GasBuddy.com</title>
07-23 13:19:57.833: D/after parsing(32136):   <base id="ctl00_head_base" href="http://www.gasbuddy.com/" />
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript" src="/js/menu_v3.js?q=11"></script>
07-23 13:19:57.833: D/after parsing(32136):   <link href="/Style.css" rel="Stylesheet" />
07-23 13:19:57.833: D/after parsing(32136):   <link href="/css/main.css?q=13" rel="Stylesheet" />
07-23 13:19:57.833: D/after parsing(32136):   <meta http-equiv="pragma" content="no-cache" />
07-23 13:19:57.833: D/after parsing(32136):   <link rel="shortcut icon" href="/favicon.ico" />
07-23 13:19:57.833: D/after parsing(32136):   <!--[if lt IE 7]>        <link href="/css/main_ie6.css?q=11" rel="Stylesheet" />        <![endif]-->
07-23 13:19:57.833: D/after parsing(32136):   <!-- PUT THIS TAG IN THE head SECTION -->
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript" src="http://partner.googleadservices.com/gampad/google_service.js"></script>
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">  GS_googleAddAdSenseService("ca-pub-9634286501775085");  GS_googleEnableAllServices();</script>
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">    var site;    var siteleft;    var siteright;    var site_length;    site="GasBuddy".toLowerCase();    site_length=site.length;    siteleft=site.substring(0,4);    siteright=site.substring(site_length-4,site_length);    site = siteleft + siteright;    GA_googleAddAttr("GasPri_URL", site);    </script>
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">  GA_googleAddSlot("ca-pub-9634286501775085", "GasBuddy_Content_Top_728x90");  GA_googleAddSlot("ca-pub-9634286501775085", "GasBuddy_Content_Top_160x600");  GA_googleAddSlot("ca-pub-9634286501775085", "GasBuddy_Content_160x600_Bottom");</script>
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">    GA_googleFetchAds();</script>
07-23 13:19:57.833: D/after parsing(32136):   <!-- END OF TAG FOR head SECTION -->
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript"> function getElementPosition(offsetTrail){    var offsetLeft = 0;    var offsetTop = 0;    while (offsetTrail){        offsetLeft += offsetTrail.offsetLeft;        offsetTop += offsetTrail.offsetTop;        offsetTrail = offsetTrail.offsetParent;    }    if (navigator.userAgent.indexOf('Mac') != -1 && typeof document.body.leftMargin != 'undefined'){        offsetLeft += document.body.leftMargin;        offsetTop += document.body.topMargin;    }    return {left:offsetLeft,top:offsetTop};}var ad_containers = [['divSkyscraper', 'divSky'], ['divLeaderboard','div728']];function moveAd() {  var i = 0;  for (i=0; i<ad_containers.length; i++){         if (document.getElementById(ad_containers[i][0])){         document.getElementById(ad_containers[i][0]).style.display = 'block';            document.getElementById(ad_containers[i][0]).style.position='absolute';            document.getElementById(ad_containers[i][0]).style.top=getElementPosition(document.getElementById(ad_containers[i][1])).top+"px";            document.getElementById(ad_containers[i][0]).style.left=getElementPosition(document.getElementById(ad_containers[i][1])).left+"px";         }    }        }    </script>
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">window._addWindowOnResize = function (func){if (typeof window.onresize == 'function'){var oldFunc = window.onresize;window.onresize = function() { oldFunc(); func(); }}else{window.onresize = func;}}</script>
07-23 13:19:57.833: D/after parsing(32136):  </head>
07-23 13:19:57.833: D/after parsing(32136):  <body>
07-23 13:19:57.833: D/after parsing(32136):   <input id="adcoord" type="hidden" value="" />
07-23 13:19:57.833: D/after parsing(32136):   <input name="ctl00$serveradcoord" type="hidden" id="ctl00_serveradcoord" />
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">        document.getElementById('adcoord').value= document.getElementById('ctl00_serveradcoord').value;    </script>
07-23 13:19:57.833: D/after parsing(32136):   <form name="aspnetForm" method="post" action="GB_Price_List.aspx" id="aspnetForm">
07-23 13:19:57.833: D/after parsing(32136):    <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUJODA2MjYxNTkwD2QWAmYPZBYEAgIPFgIeBGhyZWYFGGh0dHA6Ly93d3cuZ2FzYnVkZHkuY29tL2QCBQ9kFgRmDxYCHgRUZXh0BXk8aW5wdXQgaWQ9ImFkY29vcmQiIHR5cGU9ImhpZGRlbiIgdmFsdWU9ImxhdD00NC45NzIwODQ5MTYxMDUmYW1wO2xuZz0tOTMuMjU1Mzg2MzUyNTM5JmFtcDtydD1zY3JpcHQmYW1wO2NiPTQzMDA0MDc2OD4iIC8+ZAIJDxYCHwFlZGQFCm9rVVAnH00/+plv75hl5Bjosg==" />
07-23 13:19:57.833: D/after parsing(32136):    <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAgLon7DQAgL9t/yMB9Wx1nzDGAm5Ha

Как вы можете видеть, строка не является тегом закрытия html </html> почему он не все помещается внутри String?

Остальная часть страницы HTML выглядит следующим образом (ну некоторые из остальных, я достиг предела тела…):

</div>

<div>

<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAgK6gqnzAwL9t/yMBxUlbb+cRAiGT6aQPAFgtAU0IQZv" />
</div>
<input id="adcoord" type="hidden" value="lat=44.972084916105&amp;lng=-93.255386352539&amp;rt=script&amp;cb=257130208>" />
<div id="main_wrapper" >


<style type="text/css">
a.social {
background-image: url(/images/art/social_small_sp.png);
background-repeat: no-repeat;
padding: 3px 0px 3px 25px;
margin: 0px 10px 0px 0px;
text-decoration: underline;
}

a.social:hover {
text-decoration: underline;
}

a.fb { background-position: 0px 2px;}
a.tw { background-position: 0px -35px;}

</style>
<div style="font-size: 12px; height: 21px;">
<div style="float: left; padding-top: 3px;">
<a id="ctl00_GBTP_HyperLink1" href="Registration.aspx">[Become A Member]</a>&nbsp;

<a id="ctl00_GBTP_HyperLink2" href="GB_Mem_log_in.aspx">[Log In]</a>

</div>
<div style="float: right; padding-top: 3px;">
<b>Follow Us</b>&nbsp;&nbsp;&nbsp;
<a href="http://www.facebook.com/gasbuddy" target="_blank" class="social fb">Facebook</a>

<a href="http://twitter.com/gasbuddy" target="_blank" class="social tw">Twitter</a>

</div>
</div>


<style type="text/css">

td.gb_h_search {
width: 240px;
vertical-align: bottom;
padding-bottom: 5px;
font-size: 0px;
}

td.gb_h_search span {
font-weight: bold;
color: #555555;
font-size: 17px;
}

td.gb_h_search div {
margin-top: 2px;
}

td.gb_h_class {
vertical-align: bottom;
padding-bottom: 5px;
}

td.gb_h_class a {
font-weight: bold;
padding-left: 20px;
}

</style>


<div id="header" onkeydown="return txtSearch_click(event);">
<table cellspacing="0" cellpadding="0" border="0" style="width: 968px">
<tr>
<td valign="top" width="425px" style="">
<a href="http://www.GasBuddy.com/"><img id="imgHeadbar" alt="" src="../images/logos/gasbuddy_logo.gif" width="425" height="58" /></a>

</td>
<td class="gb_h_search">

</td>
<td class="gb_h_class">
<div>

</div>
</td>
</tr>
</table>
</div>



<style type="text/css">
#s_n_home a {
background-image: url(/images/menu/tp_sp.png);
background-position: 1px 1px;
background-repeat: no-repeat;
vertical-align: bottom;
padding-left: 24px;
}

#subnavi2 .s_n_feat {
padding: 0 3px;
}

#s_n_home li.s_n_feat, #s_n_home li.s_n_feat:hover {background-position: 0px 0px;}
#s_n_home a.s_n_feat_map {background-position: 4px 5px;}
#s_n_home a.s_n_feat_tc {background-position: 4px -27px}
#s_n_home a.s_n_feat_log {background-position: 3px -62px;}
#s_n_home a.s_n_feat_chart {background-position: 4px -133px;}
#s_n_home a.s_n_feat_prize {background-position: 4px -98px;}
#s_n_home a.s_n_feat_tip {background-position: 3px -164px;}
#s_n_home a.s_n_feat_blog {background-position: 3px -194px;}
</style>

<div id="navi2">
<ul>
<li id="n_home"><a href="http://www.gasbuddy.com/">Home</a><span></span></li>

<li id="n_blog"><a href="http://blog.gasbuddy.com/" target="_blank">Blog</a><span></span></li>
<li id="n_gas" class="n_sel"><a href="/GB_Price_List.aspx">Gas Prices</a><span></span></li>
<li id="n_charts"><a href="/gb_retail_price_chart.aspx?time=24">Price Charts</a><span></span></li>
<li id="n_maps"><a href="/gb_gastemperaturemap.aspx">Gas Price Maps</a><span></span></li>
<li id="n_points"><a href="/GB_Contest_Info.aspx?cntry=GB">Points &amp; Prizes</a><span></span></li>
<li id="n_wireless"><a href="/GasBuddyMobileApps.aspx">Mobile Apps</a><span></span></li>
<li id="n_media"><a href="http://media.gasbuddy.com/">Media</a><span></span></li>

<li id="n_help"><a href="/gb_contact.aspx">Contact</a><span></span></li>

<li id="n_advertise"><a href="/GB_AdvertiseWithUs.aspx">Advertise with us</a></li>

</ul>
</div>

<div id="subnavi2">
<div id="s_n_home">
<ul>
<li class="s_n_feat">Top Features:</li>
<li><a href="/gb_gastemperaturemap.aspx" class="s_n_feat_map">Gas Price Heat Map</a><span></span></li>
<li><a href="/Trip_Calculator.aspx" class="s_n_feat_tc">Trip Cost Calculator</a><span></span></li>
<li><a href="/gb_retail_price_chart.aspx?time=24" class="s_n_feat_chart">Gas Price Charts</a><span></span></li>
<li><a href="http://blog.gasbuddy.com/" target="_blank" class="s_n_feat_blog">GasBuddy Blog</a><span></span></li>
<li><a href="GB_Contest_Info.aspx" class="s_n_feat_prize">Win Prizes</a><span></span></li>
<li><a href="/GB_Fuel_Save.aspx" class="s_n_feat_tip">Fuel Saving Tips</a></li>

</ul>
</div>

<div id="s_n_gas" class="s_n_on">
<ul>
<li><a href="/Trip_Calculator.aspx">Trip Cost Calculator</a><span></span></li>

<li><a href="/GB_StateList.aspx">Gas Prices by State/Province</a><span></span></li>

<li><a href="/GB_Price_List.aspx">City &amp; State Averages</a><span></span></li>

<li><a href="/GB_Fuel_Save.aspx">Fuel Saving Tips</a></li>

</ul>
</div>

<div id="s_n_charts">
<ul>
<li><a href="/gb_retail_price_chart.aspx?time=1">Past Month</a><span></span></li>

<li><a href="/gb_retail_price_chart.aspx?time=12">Past Year</a><span></span></li>

<li><a href="/gb_retail_price_chart.aspx?time=24">Past Two Years</a></li>

</ul>
</div>

<div id="s_n_maps">
<ul>
<li><a href="/GB_Map_Gas_Prices.aspx">Map Gas Prices</a><span></span></li>

<li><a href="/gb_gastemperaturemap.aspx">Gas Price Heat Maps</a></li>

</ul>
</div>

<div id="s_n_points">
<ul>
<li><a href="/GB_Contest_Info.aspx?cntry=GB">Prize Give-away</a><span></span></li>

<li><a href="/GB_Contest_Winners.aspx">Recent Winners</a><span></span></li>

<li><a href="/GB_Choose_Site.aspx">Get Entries</a></li>

</ul>
</div>

<div id="s_n_wireless">
<ul>

<li><a href="/GasBuddyiPhoneApp.aspx">iPhone</a><span></span></li>

<li><a href="/GasBuddyAndroidApp.aspx">Android</a><span></span></li>

<li><a href="/GasBuddyWindowsPhoneApp.aspx">Windows Phone</a><span></span></li>

<li><a href="/GasBuddyMobileApps.aspx#MobileWeb">Mobile Web</a><span></span></li>

<li><a href="/GasBuddyBlackBerryApp.aspx">BlackBerry</a></li>

</ul>
</div>

<div id="s_n_media">
<ul>
<li><a href="http://media.gasbuddy.com/">Media Story Ideas</a></li>

</ul>
</div>

<div id="s_n_help">
<ul>
<li><a href="/gb_contact.aspx">Contact Us</a><span></span></li>

<li><a href="http://media.gasbuddy.com/#ContactUs">Media Inquiries</a><span></span></li>

<li><a href="/gb_aboutus.aspx">About Us</a></li>

</ul>
</div>

<div id="s_n_blog">
<ul>
<li><a href="http://blog.gasbuddy.com/" target="_blank">Recent Blog Posts</a></li>

</ul>
</div>
<div id="s_n_advertise">
<ul>
<li><a href="/GB_AdvertiseWithUs.aspx">Advertise with us</a></li>

</ul>
</div>

<div id="s_n_fuel">
<ul>
<li><a href="/Pricelock.aspx">Control your business' fuel costs</a><span></span></li>

<li><a href="/PricelockHowItWorks.aspx">Get paid when fuel prices increase</a></li>

</ul>
</div>
</div>
<script type="text/javascript">
var gb_m = new gb_Menu('navi2', 'subnavi2', 250, 15000, 'n_gas', false);
</script>


<div class="clearfix">
<div class="main_col">
<div class="main_boxGB">


<div id="div728"></div>



<style type="text/css">
.listing {
width: 100%;
border: 1px solid #e2e2e2;
}

.listing td {
font-size: 18px;
color: #666;
padding: 5px 5px;
border-bottom: 1px solid #e2e2e2;
}

.listing a {
color: #33528A;
text-decoration: none;
}

.listing a:hover {
text-decoration: underline;
}

.listing thead tr {
background: #f2f2f2;
}

.listing tbody td:first-child {
width: 500px;
text-align: left;
}

.listing tbody tr:last-child td {
border: 0;
}

.listing .p {
text-align: right;
padding: 0 30px 0 0;
}

.listing .up {
color: #D5111B;
}

.listing .down {
color: #339900;
}

.listing .gpd {
padding-left: 30px;
background: transparent url(/images/art/gpd_logo_sm.png) no-repeat 5px 50%;
}

.listing_nav {
margin: 0 0 10px;
padding: 0;
overflow: hidden;
width: 800px;
}

.listing_nav li {
float: left;
list-style: none none outside;
width: 25%;
}

.listing_nav a {
text-align: center;
vertical-align: middle;
padding: 20px 0;
font-size: 18px;
text-decoration: none;
border: 1px solid #e2e2e2;
border-right: 0;
display: block;

}

.listing_nav a:hover {
background: #f2f2f2;
text-decoration: underline;
}

.listing_nav li:last-child a {
border-right: 1px solid #e2e2e2;
}

</style>


<ul class="listing_nav">
<li>
<a href="/GB_Price_List.aspx?cntry=USA">US States</a>

</li>
<li>
<a href="/GB_Price_List.aspx?cntry=USA#us_cities">US Cities</a>

</li>
<li>
<a href="/GB_Price_List.aspx?cntry=CAN">Canadian Provinces</a>


</li>
<li>
<a href="/GB_Price_List.aspx?cntry=CAN#can_cities">Canadian Cities</a>

</li>
</ul>




<div id="ctl00_Content_GBFPL_pnlCanada">

<a name="can"></a>
<div style="margin: 10px 0;">

<table class="listing" cellpadding="0" cellspacing="0">
<thead>
<tr>
<td colspan="4">
Average Regular Gas Price By Canadian Province
</td>
</tr>
</thead>
<tbody>

<tr>
<td>
<a href="http://www.Albertagasprices.com" target="_blank">

Alberta
</a>
</td>
<td class="p">
117.6
</td>
<td class="p down">
-0.1
</td>
<td>
<img src="/images/art/sm_trend_flat.gif" alt="" />

</td>
</tr>

<tr>
<td>
<a href="http://www.Manitobagasprices.com" target="_blank">

Manitoba
</a>
</td>
<td class="p">
123.5
</td>
<td class="p down">
-0.4
</td>
<td>
<img src="/images/art/sm_trend_down.gif" alt="" />

</td>
</tr>

<tr>
<td>
<a href="http://www.Saskgasprices.com" target="_blank">

Saskatchewan
</a>
</td>
<td class="p">
126.3
</td>
<td class="p">
0.0
</td>
<td>
<img src="/images/art/sm_trend_flat.gif" alt="" />

</td>
</tr>

<tr>
<td>
<a href="http://www.NewBrunswickgasprices.com" target="_blank">

New Brunswick
</a>
</td>
<td class="p">
130.9
</td>
<td class="p down">
-0.1
</td>
<td>
<img src="/images/art/sm_trend_flat.gif" alt="" />

</td>
</tr>

<tr>
<td>
<a href="http://www.Ontariogasprices.com" target="_blank">

Ontario
</a>
</td>
<td class="p">
132.6
</td>
<td class="p up">
+0.3
</td>
<td>
<img src="/images/art/sm_trend_flat.gif" alt="" />

</td>
</tr>

<tr>
<td>
<a href="http://www.PEIgasprices.com" target="_blank">

И так далее… Только около одной пятой полной страницы находится в строке…

Теперь, все равно, чтобы получить весь контент с этой страницы на GasBuddy в одну строку?

Лучший ответ:

Установить useragent

userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36")

Оцените статью
TechArks.Ru
Добавить комментарий