java開發(fā)項(xiàng)目集錦(附源碼)Word版

上傳人：每**** 文檔編號：52157201 上傳時間：2022-02-07 格式：DOC 頁數(shù)：70 大小：234.50KB

收藏版權(quán)申訴舉報下載

第1頁 / 共70頁

第2頁 / 共70頁

第3頁 / 共70頁

下載文檔到電腦，查找使用更方便

0 積分

下載資源

還剩頁未讀，繼續(xù)閱讀

資源描述：

《java開發(fā)項(xiàng)目集錦(附源碼)Word版》由會員分享，可在線閱讀，更多相關(guān)《java開發(fā)項(xiàng)目集錦(附源碼)Word版（70頁珍藏版）》請?jiān)谘b配圖網(wǎng)上搜索。

1、新浪天氣預(yù)報新聞java抓去程序package .weather1;import java.io.BufferedReader;import java.io.ByteArrayOutputStream;import java.io.File;import java.io.FileWriter;import java.io.IOException;import java.io.InputStream;import java.io.InputStreamReader;import java.io.PrintWriter;import .URL;import .URLConnection;impo

2、rt java.util.regex.Matcher;import java.util.regex.Pattern;import mons.logging.Log;import mons.logging.LogFactory;import .update.Getdata;/* * 正則方式抓取新浪天氣新聞上的新聞 * 地址 * param args */public class Newlist private static final Log log = LogFactory.getLog(Newlist.class); /* * 測試 * param args */ public stati

3、c void main(String args) Newlist n=new Newlist(); String k=n.getNewList(); for (int i=0;ik.length;i+) System.out.println(ki.replace(href=, href=newinfo2.jsp?url=); String m=n.getNewinfo(news/2008/1119/35261.html); for (int l=0;lm.length;l+) System.out.println(ml); /* * 由url地址獲得新聞內(nèi)容string推薦精選 * 新聞中的圖

4、片下載到本地，文中新聞地址改成本地地址 * param url * return */ public String getNewinfo(String url) String URL= /30是指取30段滿足給出的正則條件的字符串，如果只找出10個，那數(shù)組后面的全為null String s = analysis(.*?) , getContent(URL) , 30); for (int i=0;i , content , 50); String s = analysis(.*?) , content , 50); return s; private String analysis(Stri

5、ng pattern, String match , int i) Pattern sp = Ppile(pattern); Matcher matcher = sp.matcher(match); String content = new Stringi; for (int i1 = 0; matcher.find(); i1+) contenti1 = matcher.group(1); /下面一段是為了剔除為空的串 int l=0; for (int k=0;kcontent.length;k+) if (contentk=null) l=k; break; String content

6、2; if (l!=0) content2=new Stringl; for (int n=0;n 0) outputstream.write(str_b,0,i); all_content = outputstream.toString(); / System.out.println(all_content); catch (Exception e) e.printStackTrace(); log.error(獲取網(wǎng)頁內(nèi)容出錯); finally uc = null; / return new String(all_content.getBytes(ISO8859-1); System.o

7、ut.println(all_content.length(); return all_content; 現(xiàn)在的問題是:圖片下載不全，我用后面兩種getContent方法下圖片，下來的圖片大小都和文件頭里獲得的Content-Length，也就是圖片的實(shí)際大小不符，預(yù)覽不了。而且反復(fù)測試，兩種方法每次下來的東西大小是固定的，所以重復(fù)下載沒有用？測試toString后length大小比圖片實(shí)際的小，而生成的圖片比圖片數(shù)據(jù)大。下載后存儲過程中圖片數(shù)據(jù)增加了！圖片數(shù)據(jù)流toString過程中數(shù)據(jù)大小發(fā)生了改變，還原不回來。其它新聞內(nèi)容沒有問題。估計是圖片的編碼格式等的問題。在圖片數(shù)據(jù)流讀過來時

8、直接生成圖片就可以了。public int saveImage (String strUrl) URLConnection uc = null; try URL url = new URL(strUrl); uc = url.openConnection(); uc.setRequestProperty(User-Agent, Mozilla/4.0 (compatible; MSIE 5.0; Windows XP; DigExt); 推薦精選 /uc.setReadTimeout(30000); /獲取圖片長度 /System.out.println(Content-Length: +uc

9、.getContentLength(); /獲取文件頭信息 /System.out.println(Header+uc.getHeaderFields().toString(); if (uc = null) return 0; InputStream ins = uc.getInputStream(); byte str_b = new byte1024; int byteRead=0; String images=strUrl.split(/); String imagename=imagesimages.length-1; File fwl = new File(imagename);

10、FileOutputStream fos= new FileOutputStream(fwl); while (byteRead=ins.read(str_b) 0) fos.write(str_b,0,byteRead); ; fos.flush(); fos.close(); catch (Exception e) e.printStackTrace(); log.error(獲取網(wǎng)頁內(nèi)容出錯); finally uc = null; return 1; 方法二：首先把搜索后的頁面用流讀取出來，再寫個正則，去除不要的內(nèi)容，再把最后的結(jié)果存成xml格式文件、或者直接存入數(shù)據(jù)庫，用的時候再調(diào)用

11、本代碼只是顯示html頁的源碼內(nèi)容，如果需要抽取內(nèi)容請自行改寫public static String regex()中的正則式 package rssTest; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import .HttpURLConnection; import .MalformedURLException; 推薦精選import .URL; import .URLConnection; import java.util.ArrayList;

12、 import java.util.List; import java.util.regex.Matcher; import java.util.regex.Pattern; public class MyRSS /* * 獲取搜索結(jié)果的html源碼 * */ public static String getHtmlSource(String url) StringBuffer codeBuffer = null; BufferedReader in=null; try URLConnection uc = new URL(url).openConnection(); /* * 為了限制客戶端

13、不通過網(wǎng)頁直接讀取網(wǎng)頁內(nèi)容,就限制只能從瀏覽器提交請求. * 但是我們可以通過修改http頭的User-Agent來偽裝,這個代碼就是這個作用 * */ uc.setRequestProperty(User-Agent, Mozilla/4.0 (compatible; MSIE 5.0; Windows XP; DigExt); / 讀取url流內(nèi)容 in = new BufferedReader(new InputStreamReader(uc .getInputStream(), gb2312); codeBuffer = new StringBuffer(); String tempC

14、ode = ; / 把buffer內(nèi)的值讀取出來,保存到code中 while (tempCode = in.readLine() != null) codeBuffer.append(tempCode).append(n); in.close(); catch (MalformedURLException e) 推薦精選 e.printStackTrace(); catch (IOException e) e.printStackTrace(); return codeBuffer.toString(); /* * 正則表達(dá)式 * */ public static String regex(

15、) String googleRegex = (.*?)href=(.*?)(.*?)(.*?)(.*?)(.*?); return googleRegex; /* * 測試用 * 在google中檢索關(guān)鍵字，并抽取自己想要的內(nèi)容 * * */ public static List GetNews() List newsList = new ArrayList(); String allHtmlSource = MyRSS .getHtmlSource( maxthon&hs=SUZ&q=%E8%A7%81%E9%BE%99%E5%8D%B8%E7%94%B2&meta=&aq=f); Pat

16、tern pattern = Ppile(regex(); Matcher matcher = pattern.matcher(allHtmlSource); while (matcher.find() String urlLink = matcher.group(2); String title = matcher.group(4); title = title.replaceAll(, ); title = title.replaceAll(, ); title = title.replaceAll(., ); 推薦精選 String content = matcher.group(6);

17、 content = content.replaceAll(, ); content = content.replaceAll(, ); content = content.replaceAll(., ); newsList.add(urlLink); newsList.add(title); newsList.add(content); return newsList; /* * main方法 * */ public static void main(String args) System.out .println(MyRSS .getHtmlSource( 方法三：jsp自動抓取新聞自動

18、抓取新聞package com.news.spider;import java.io.File;import java.io.FileFilter;import java.text.SimpleDateFormat;import java.util.ArrayList;import java.util.Calendar;import java.util.Date;import java.util.List;import java.util.regex.Matcher;import java.util.regex.Pattern;import com.db.DBAccess;public cla

19、ss SpiderNewsServer public static void main(String args) throws Exception /設(shè)置抓取信息的首頁面 String endPointUrl = /獲得當(dāng)前時間推薦精選 Calendar calendar=Calendar.getInstance(); SimpleDateFormat sdf=new SimpleDateFormat(yyyy-MM-dd); String DateNews = sdf.format(calendar.getTime(); /* * 抓取二級URl 開始 * url匹配類型： */ List

20、listNewsType = new ArrayList(); /取入口頁面html WebHtml webHtml = new WebHtml(); String htmlDocuemtnt1 = webHtml.getWebHtml(endPointUrl); if(htmlDocuemtnt1 = null | htmlDocuemtnt1.length() = 0) return; String strTemp1 = String strTemp2 = ; int stopIndex=0; int startIndex=0; int dd=0; while(true) dd+; sta

21、rtIndex = htmlDocuemtnt1.indexOf(strTemp1, stopIndex); System.out.println(=+startIndex); stopIndex= htmlDocuemtnt1.indexOf(strTemp2, startIndex); System.out.println(=-+stopIndex); if(startIndex!=-1 & stopIndex!=-1) String companyType=htmlDocuemtnt1.substring(startIndex,stopIndex); System.out.println

22、(-+companyType); System.out.println(-+companyType.indexOf(); companyType=companyType.substring(0,companyType.indexOf(); System.out.println(#-+companyType); listNewsType.add(companyType); if(dd10) break; if(stopIndex=-1 | startIndex=-1) break; System.out.println(listCompanyType=+listNewsType.size();

23、/*推薦精選 * 抓取二級URl 結(jié)束 */ /* * 抓取頁面內(nèi)容開始 */ String title=; String hometext=; String bodytext=; String keywords=; String counter = 221; String cdate= ; int begainIndex=0;/檢索字符串的起點(diǎn)索引 int endIndex=0;/檢索字符串的終點(diǎn)索引 String begainStr;/檢索開始字符串 String endStr;/檢索結(jié)束字符串 for (int rows = 1; rows 0) WebHtml newsListHtm

24、l = new WebHtml(); String htmlDocuemtntCom = newsListHtml.getWebHtml(strNewsDetail); System.out.println($-+htmlDocuemtntCom); if(htmlDocuemtntCom = null | htmlDocuemtntCom.length() = 0) return; /截取時間 int dateBegainIndex = htmlDocuemtntCom.indexOf(時間：); System.out.println(%-+dateBegainIndex); String

25、newTime = htmlDocuemtntCom.substring(dateBegainIndex,dateBegainIndex+20); System.out.println(-+newTime); String newTimeM = newTime.substring(newTime.lastIndexOf(-)+1,newTime.lastIndexOf(-)+3); String dateM = DateNews.substring(DateNews.lastIndexOf(-)+1); System.out.println(-+newTimeM); System.out.pr

26、intln(-+dateM); if(newTimeM = dateM | newTimeM.equals(dateM) /檢索新聞標(biāo)題 begainStr=; 推薦精選 endStr=時間：; begainIndex=htmlDocuemtntCom.indexOf(begainStr,0); System.out.println(&-+begainIndex); endIndex=htmlDocuemtntCom.indexOf(endStr,0); System.out.println(&-+endIndex); if(begainIndex!=-1 & endIndex!=-1) ti

27、tle = htmlDocuemtntCom.substring(begainIndex,endIndex).trim(); title = title.substring(title.indexOf()+4,title.indexOf(); title = title.replace(, ); title = title.replace(;, ); title = title.replace( , ); /檢索新聞內(nèi)容 begainStr=; endStr=; begainIndex=htmlDocuemtntCom.indexOf(begainStr,0); endIndex=htmlDo

28、cuemtntCom.indexOf(endStr,0); if(begainIndex!=-1 & endIndex!=-1) bodytext = htmlDocuemtntCom.substring(begainIndex,endIndex).trim(); if(bodytext.indexOf()0 & bodytext.indexOf()bodytext.indexOf() & bodytext.indexOf()0) bodytext = bodytext.substring(bodytext.indexOf()+3,bodytext.indexOf(); bodytext=bo

29、dytext.replace( , ); bodytext=bodytext.replace(, ); bodytext=bodytext.replace(n, ); bodytext=bodytext.replace(, ); bodytext=bodytext.replace(;, ); /簡介 if(bodytext.length()40) hometext = bodytext.substring(0,40)+.; else hometext = bodytext+.; /瀏覽量 String str = String.valueOf(Math.random(); count

30、er = str.substring(str.lastIndexOf(.)+1,5); Calendar cal = Calendar.getInstance(); cal.setTime(new Date(); cdate = cal.getTimeInMillis()+;推薦精選 cdate = cdate.substring(0,10); else continue; System.out.println(-+title); System.out.println(-+cdate); System.out.println(-+cdate); System.out.println(-+hom

31、etext); System.out.println(-+bodytext); System.out.println(-+keywords); System.out.println(-+counter); /*String str = INSERT INTO ecim_stories(uid,title,created,published,hostname,hometext,bodytext,keywords,counter,topicid,ihome,notifypub,story_type,topicdisplay,topicalign,comments,rating,votes,desc

32、ription) ; str += VALUE (1,+title+,+cdate+,+cdate+,125.122.83.177,+hometext+,+bodytext+,+keywords+,+counter+,1,0,1,admin,0,R,0,0,0,); DBAccess db = new DBAccess(); if(db.executeUpdate(str)0) System.out.println(-成功！); else System.out.println(-失??！); */ /* * 抓取頁面內(nèi)容結(jié)束 */ package com.news.spider;import .URL;import .URLConnection;import java.io.BufferedReader;import java.io.InputStreamReader;public class WebHtml /* 根據(jù)url,抓取webhmtl內(nèi)容* param url推薦精選*/public String getWebHtml(String url) try UR

展開閱讀全文

溫馨提示:
1: 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2: 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3.本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 裝配圖網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

點(diǎn)擊下載此資源

java開發(fā)項(xiàng)目集錦(附源碼)Word版

最新文檔

相關(guān)資源

相關(guān)搜索