甲骨文《數(shù)據(jù)倉(cāng)庫(kù)概念》28頁(yè).ppt
,XuXin,PresalesConsultantOracle(China)Co.,Ltd.,數(shù)據(jù)倉(cāng)庫(kù)的概念,Whatis.,數(shù)據(jù)倉(cāng)庫(kù)(DataWarehouse)/數(shù)據(jù)集市(DataMart)決策支持系統(tǒng)(DecisionSupportSystem)聯(lián)機(jī)分析處理(OLAP)/ROLAP/MOLAP元數(shù)據(jù)(MetaData)分析指標(biāo)(Measure)/維(Dimension)星型模型(StarSchema)/雪花模型(SnowSchema)數(shù)據(jù)鉆入/數(shù)據(jù)鉆出(DrillDown/DrillUp)表旋轉(zhuǎn)(TableRotation)數(shù)據(jù)挖掘(DataMining),數(shù)據(jù)倉(cāng)庫(kù)幾大功能,Query/ReportDrillup/DrillDownCompareExceptionForcast,WhatifDataMining,數(shù)據(jù)倉(cāng)庫(kù)實(shí)施方法,建立數(shù)據(jù)倉(cāng)庫(kù)需要考慮的因素,擴(kuò)展性靈活性集成性可靠性,數(shù)據(jù)倉(cāng)庫(kù)專家的建議,需要業(yè)務(wù)人員的積極參與通過(guò)原型設(shè)計(jì)驗(yàn)證需求確定數(shù)據(jù)倉(cāng)庫(kù)的范圍,不要試圖Warehouse所有數(shù)據(jù)為不同需求選擇合適工具控制風(fēng)險(xiǎn)利用外部Consultant的經(jīng)驗(yàn)重點(diǎn)放在不同系統(tǒng)的集成,建立數(shù)據(jù)倉(cāng)庫(kù)舉例,UseaBuildingEstateOLTPdatabaseasanexampletoillustratetheconceptsandhowtobuildasuccessfulDataWarehousewhichusedtocheckandforecasttherentalrateandsellamountinHongKong.,步驟1:確定數(shù)據(jù)倉(cāng)庫(kù)的問(wèn)題范圍,列出4月份香港地區(qū)每日房屋銷售情況找出銷售額大于4百萬(wàn)的居民住宅項(xiàng)目比較Whampoo和Kornhill地區(qū)上月銷售情況找出售屋數(shù)量最多的前3個(gè)地區(qū)截止到當(dāng)月的累計(jì)銷售數(shù)量用圖表反映最佳銷售模式時(shí)間序列分析,確定數(shù)據(jù)倉(cāng)庫(kù)的問(wèn)題范圍,確定業(yè)務(wù)需求和用戶需求:用戶查詢執(zhí)行的頻度系統(tǒng)保留數(shù)據(jù)的年限用戶主要希望從哪些角度,哪些層次分析數(shù)據(jù)數(shù)據(jù)源是哪些系統(tǒng),步驟2:選擇合適的軟硬件平臺(tái),可靠的供應(yīng)商數(shù)據(jù)建模和管理工具易用性開(kāi)放集中管理性能并行處理,選擇數(shù)據(jù)庫(kù)平臺(tái)的依據(jù):,前3位的考慮因素:易用性92.4%集中管理65.2%可靠的供應(yīng)商65.2%,數(shù)據(jù)倉(cāng)庫(kù)的考慮因素,(Source:DataWarehouseInstitute-February96),MOLAP還是ROLAP?,ROLAP和MOLAP的功能區(qū)別,TransactionSystems,DecisionSupportSystems,Strategic,Tactical,MDB,RDBMS,DataCache,linkage,步驟3:根據(jù)需要?jiǎng)?chuàng)建新的實(shí)體,#Code_no,No_of_transaction,Constructor_ID,Developer_ID,Buildingdate,Purchasedate,Purchaseprice,Address,Area,Apartment,#Code_no,#Transaction_no,Name/Company,HKID,ContactPhone#,ContactAddress,PurchaseDate,PurchasePrice,Owner,#Code_no,#Flat,#Transaction_no,Name,HKID,Occupy_type(P,R),ContactPhone#,ContactAddress,Date,Price,Occupant,Contractor_ID,CompanyName,Address,ContactPhone#,Constructor,#Code_no,#Flat,No_of_trans,Type,Floor,Area(Building),Area(Actual),FlatDetails,Developer_ID,CompanyName,Address,ContactPhone#,Developer,Day,Month,Quarter,Year,Time,Territory,District,Region,Building/Estate,Geographic,Location,Type,Size,Area,HousingTypes,步驟4:確定維表刪除不必要的表,步驟5:建立層次結(jié)構(gòu),Date,1-Jan-94,13-Jun-95,12-Jan-96,12-Apr-96,15-Apr-96,20-Oct-96,20-Oct-96,12-Dec-96,1-Jan-97,31-Mar-97,15-Apr-97,?.,Time,Year,Quarter,Month,Day,TimeHierarchy,步驟6:確定屬性,TypeSizeAreaClass:AttributesofHousingType,HousingType,Occupant,HousingTypedimensionlookuptable,Attributes,步驟7:建立FactTable,確定合適的粒度,Time,Location,Type,Area,OccupantName,PurchasePrice,Rent,?.,SalesFactTable,步驟8:建立數(shù)據(jù)倉(cāng)庫(kù)模型,BuildingEstateOLTPEnvironment,TimeLocationTypeAreaOccupantNamePurchasePriceRent?.,SalesFactTable,DayMonthQuarterYear,Time,TerritoryDistrictRegionBuilding/Estate,GeographicLocation,TypeSizeArea,HousingTypes,#Code_noNo_of_transactionConstructor_IDDeveloper_IDBuildingdatePurchasedatePurchasepriceAddressArea,Apartment,#Code_no#Transaction_noName/CompanyHKIDContactPhone#ContactAddressPurchaseDatePurchasePrice,Owner,#Code_no#Flat#Transaction_noNameHKIDOccupy_type(P,R)ContactPhone#ContactAddressDatePrice,Occupant,Contractor_IDCompanyNameAddressContactPhone#,Constructor,#Code_no#FlatNo_of_transHousingTypeFloorArea(Building)Area(Actual),FlatDetails,Developer_IDCompanyNameAddressContactPhone#,Developer,Transform,BuildingEstateDataWarehouseOLAPEnvironment,步驟9:數(shù)據(jù)倉(cāng)庫(kù)模型優(yōu)化,TimeLocationTypeAreaOccupantNamePurchasePriceRent?.,SalesFactTable,TypeSizeArea,HousingTypes,TimeLocationTypeAreaOccupantNamePurchasePriceRent?.,SalesFactTable,DayMonthQuarterYear,Time,TerritoryDistrictRegionBuilding/Estate,GeographicLocation,TypeSizeArea,HousingTypes,Starschema,Snowflakeschema,數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)優(yōu)化的原則,避免數(shù)據(jù)實(shí)時(shí)匯總(建立匯總表)減少表連接操作(不要超過(guò)3-5個(gè))用IDcode作關(guān)鍵字減少I/O競(jìng)爭(zhēng)利用分區(qū)技術(shù)提高性能和可管理性,估算數(shù)據(jù)倉(cāng)庫(kù)容量的算法,Estimatedsizeofdatabase=98*96*20*1000*0.75=141.12Mb,步驟10:從業(yè)務(wù)系統(tǒng)中抽取數(shù)據(jù)到數(shù)據(jù)倉(cāng)庫(kù),數(shù)據(jù)抽取的要求:可訪問(wèn)各種數(shù)據(jù)源可滿足時(shí)間要求可滿足數(shù)據(jù)轉(zhuǎn)換要求可檢測(cè)源系統(tǒng)中數(shù)據(jù)的變化,步驟11:開(kāi)發(fā)前端應(yīng)用,步驟12:數(shù)據(jù)倉(cāng)庫(kù)的管理,安全管理備份和恢復(fù)高可用性數(shù)據(jù)時(shí)效,