哈爾濱工業(yè)大學(xué)深圳-模式識別-2017-考試重要知識點word
《哈爾濱工業(yè)大學(xué)深圳-模式識別-2017-考試重要知識點word》由會員分享,可在線閱讀,更多相關(guān)《哈爾濱工業(yè)大學(xué)深圳-模式識別-2017-考試重要知識點word(19頁珍藏版)》請在裝配圖網(wǎng)上搜索。
1?(?i | ?j) be the loss incurred for taking action ?i when the state of nature is ?j.action ?i assign the sample into any class-Conditional risk for i = 1,…,a ??cjjii xPxR1 )|()|(??Select the action ?i for which R(?i | x) is minimumR is minimum and R in this case is called the Bayes risk = best reasonable result that can be achieved!?ij :loss incurred for deciding ?i when the true state of nature is ?jgi(x) = - R(?i | x)max. discriminant corresponds to min. riskgi(x) = P(?i | x)max. discrimination corresponds to max. posteriorgi(x) ? p(x | ?i) P(?i) gi(x) = ln p(x | ?i) + ln P(?i)問題由估計似然概率變?yōu)楣烙嬚龖B(tài)分布的參數(shù)問題極大似然估計和貝葉斯估計結(jié)果接近相同,但方法概念不同1Please present the basic ideas of the maximum likelihood estimation method and Bayesian estimation method. When do these two methods have similar results ?請描述最大似然估計方法和貝葉斯估計方法的基本概念。什么情況下兩個方法有類似的結(jié)果?I.Maximum-likelihood view the parameters as quantities whose values are fixed but unknown. The best estimate of their value is defined to be the one that maximizes the probability of obtaining the samples actually observed.II.Bayesian methods view the parameters as random variables having some known prior distribution. Observation of the samples converts this to a posterior density, thereby revising our opinion about the true values of the parameters.III.Under the condition that the number of the training samples approaches to the infinity, the estimation of the mean obtained using Bayesian estimation method is almost identical to that obtained using the maximum likelihood estimation method.111最小風(fēng)險決策通常有一個更低的分類準(zhǔn)確度相比于最小錯誤率貝葉斯決策。然而,最小風(fēng)險決策能夠避免可能的高風(fēng)險和損失。貝葉斯參數(shù)估計方法。Vectorize the samples.Calculation of the mean of all training samples.Calculation of the covariance matrixCalculation of eigenvectors and eigenvalue of the covariance matrix. Build the feature space.Feature extraction of all samples. Calculation the feature value of every sample.Calculation of the test sample feature value.Calculation of the samples of training samples like the above step.Find the nearest training sample as the result.1Exercises1. How to use the prior and likehood to calculate the posterior ? What is the formula ?怎么用先驗概率和似然函數(shù)計算后驗概率?公式是什么?P(?j | x) = p(x | ?j) . P(?j) / p(x), ??1)(jP?1)|(xj2. What’s the difference in the ideas of the minimum error Bayesian decision and minimum risk Bayesian decision? What’s the condition that makes the minimum error Bayesian decision identical to the minimum risk Bayesian decision?最小誤差貝葉斯決策和最小風(fēng)險貝葉斯決策的概念的差別是什么?什么情況下最小誤差貝葉斯決策和最小風(fēng)險貝葉斯決策是一致的(相同的)?答:在兩類問題中,若有 ,即所謂對稱損失函數(shù)的情況,則這時最小風(fēng)1221????險的貝葉斯決策和最小誤差的貝葉斯決策方法顯然是一致的。theminimumerrorB?2(|()(jj jjxp1ayesiandecision: tominimizetheclassificati1onerroroftheBayesiandecision. themini1mumriskBayesiandecision: tominimizetheri1skoftheBayesiandecision. if R(?1 | x) < R(?2 | x) action ?1: “decide ?1” is takenR(?1 | x) = ??11P(?1 | x) + ?12P(?2 | x)R(?2 | x) = ??21P(?1 | x) + ?22P(?2 | x) 3. A person takes a lab test of nuclear radiation and the result is positive. The test returns a correct positive result in 99% of the cases in which the nuclear radiation is actually present, and a correct negative result in 95% of the cases in which the nuclear radiation is not present. Furthermore, 3% of the entire population are radioaetively eontaminated. Is this person eontaminated?一人在某實驗室做了一次核輻射檢測,結(jié)果是陽性的。當(dāng)核輻射真正存在時,檢測結(jié)1果返回正確的陽性概率是 99%;當(dāng)核輻射不存在時,結(jié)果返回正確的陰性的概率是 95%。而且,所有被測人群中有 3%的人確實被輻射污染了。那么這個人被輻射污染了嗎?答: 被輻射污染概率 1()0.3P??未被輻射污染概率 297X 表示陽性, 表示陰性,則有如下結(jié)論:,1(|)0.9P?。2|5?則 112(|)(0.93(|) 0.38.(1.5).97iiiXP?? ??????21(|)(|)0.62P??根據(jù)貝葉斯決策規(guī)則有:21(|)(|)X?所以這個人未被輻射污染。4. Please present the basic ideas of the maximum likehood estimation method and Bayesian estimation method. When do these two methods have similar results ?請描述最大似然估計方法和貝葉斯估計方法的基本概念。什么情況下兩個方法有類似的結(jié)果?答:I. 設(shè)有一個樣本集 ,要求我們找出估計量 ,用來估計 所屬總體分布的某個????真實參數(shù) 使得帶來的貝葉斯風(fēng)險最小,這就是貝葉斯估計的概念。?(另一種說法:把待估計的參數(shù)看成是符合某種先驗概率分布的隨機變量;對樣本進行觀測的過程,就是把先驗概率密度轉(zhuǎn)化為后驗概率密度,這樣就利用樣本的信息修正了對參數(shù)的初始估計值)II. 最大似然估計法的思想很簡單:在已經(jīng)得到試驗結(jié)果的情況下,我們應(yīng)該尋找使這個結(jié)果出現(xiàn)的可能性最大的那個 作為真 的估計。?III.在訓(xùn)練樣本數(shù)目接近無窮時,使用貝葉斯估計方法獲得的平均值估計幾乎和使用最大似然估計的方法獲得的平均值一樣題外話:1Prior + samplesI.Maximum-likelihood view the parameters as quantities whose vales are fixed but unknown. The best estimate of their value is defined to be the one that maximizes the probability of obtaining the samples actually observed.II.Bayesian methods view the parameters as random variables having some known prior distribution. Observation of the samples converts this to a posterior density, thereby revising our opinion about the true values of the parameters.III.Under the condition that the number of the training samples approaches to the infinity, the estimation of the mean obtained using Bayesian estimation method is almost identical to that obtained using the maximum likehood estimation method.5. Please present the nature of principal component analysis.請描述主成分分析法的本質(zhì)答:主成分分析也稱主分量分析,旨在利用降維的思想,把多指標(biāo)轉(zhuǎn)化為少數(shù)幾個綜合指標(biāo)。? Capture the component that varies the most.(變化最大 )? The component that varies the most contains main information of the samples(信息量最大)? We also say that PCA is the optimal representation method, which allows us to obtain the minimum reconstruction error.(最小重構(gòu)誤差)? As the transform axes of PCA are orthogonal, it is also referred to as an orthogonal transform method.(正交變換)? PCA is also a de-correlation method.(不相關(guān)法)? PCA can be also used as a compression method and is able to obtain a high compression ratio.(高壓縮比)6. Describe the basic idea and possible advantage of Fisher discriminant analysis. 描述 Fisher 判別分析的基本概念和可能的優(yōu)勢答:Fisher 準(zhǔn)則是典型的模式識別方法,它強調(diào)將線性方法中的法向量與樣本的乘積看做樣本向量在單位法向量上的投影。所獲得的結(jié)果與正態(tài)分布協(xié)方差矩陣等的貝葉斯決策結(jié)果類似,這說明如果兩類分布圍繞各自均值的確相近,F(xiàn)isher 準(zhǔn)則可使錯誤率較小。SupervisedMaximize the between-class distance and minimize the within-class distanceExploit the training sample to produce transform axes.……(number of effective Fisher transform axes, c-1; how to avoid singular within-class scatter matrix---PCA+FDA)17. What is the K nearest neighbor classifier ? Is it reasonable ?什么是 K 近鄰分類器,它合理嗎?答: 近鄰法的基本思想是在測試樣本 x 的 k 個近鄰中,按出現(xiàn)最多的樣本類別來作為 x 的類別,即先對 x 的 k 個近鄰一一找出它們的類別,然后最 x 類進行判別。在 k 近鄰算法中,若樣本相對較稀疏,只按照前 k 個近鄰樣本的順序而不考慮其距離差別以決策測試樣本 x 的類別是不適當(dāng)?shù)?,尤其是?dāng) k 取值較大時。K nearest neighbor classifier view satisfy the k nearest neighbor rule ,the rule classifies x by assigning it the label most fequently represented among the k nearest samples; in other words, a decision is made b examining the labels on the k nearest neighbors and taking a vote.8. Is it possible that a classifier can obtain a higher accuracy for any dataset than any other classifier? 一個分類器比其他分類器在任何數(shù)據(jù)集上都能獲得更高的精度,可能嗎?答:顯然不可能的。這個理由很多。NO,9. Please describe the over-fitting problem.請描述過度擬合的問題答:過擬合:為了得到一致假設(shè)而使假設(shè)變得過度復(fù)雜稱為過擬合。想像某種學(xué)習(xí)算法產(chǎn)生了一個過擬合的分類器,這個分類器能夠百分之百的正確分類樣本數(shù)據(jù)(即再拿樣本中的文檔來給它,它絕對不會分錯) ,但也就為了能夠?qū)颖就耆_的分類,使得它的構(gòu)造如此精細復(fù)雜,規(guī)則如此嚴(yán)格,以至于任何與樣本數(shù)據(jù)稍有不同的文檔它全都認為不屬于這個類別!過擬合問題就是分類器分的太細了,太具體,Over-fitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model which has been over-fit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data.10. Usually a more complex learning algorithm can obtain a higher accuracy in the training stage. So, should a more complex learning algorithm be favored ?通常一個更復(fù)雜的學(xué)習(xí)算法在訓(xùn)練階段能獲得更高的精度。那么我就該選擇更復(fù)雜的學(xué)習(xí)算法嗎?答:不No context-independent or usage-independent reasons to favor one learning or classification method over another to obtain good generalization performance.When confronting a new pattern recognition problem, we need focus on the aspects — prior information, data distribution, amount of training data and cost or reward functions.Ugly Duckling Theorem: an analogous theorem, addresses features and patterns. shows that in the absence of assumptions we should not prefer any learning or classification algorithm over another.11. Under the condition that the number of the training samples approaches to the infinity, the estimation of the mean obtained using Bayesian estimation method is almost identical to that obtained using the maximum likehood estimation method. Is this statement correct ?1在訓(xùn)練樣本數(shù)目接近無窮時,使用貝葉斯估計方法獲得的平均值估計幾乎和使用最大似然估計的方法獲得的平均值一樣。這種情況正確嗎?答:理由同第 4 題,沒找到。YES12. Can the minimum squared error procedure be used for binary classification ? 最小平方誤差方法能用于 2 維數(shù)據(jù)的分類嗎答:略Yes, the minimum squared error procedure can be used for binary classification., .bYa??????????????????idiTinyyY,..01A simple way to set : if is from the first class, then is set to 1; if is from the biYibiYsecond class, then is set to -1.iAnother simple way to set : if is from the first class, then is set to ; if is bi ib1nifrom the second class, then is set to - .i2n13. Can you devise a minimum squared error procedure to perform multiclass classification ? 你能設(shè)計出一個能多級別識別的最小平方誤差方法嗎?14. Which kind of applications is the Markov model suitable for ?Markov 模型適合哪類應(yīng)用?答:Markov model has found greatest use in such problems, for instance speech recognition or gesture recognition.(語音、手勢識別)? The evaluation problem? The decoding problem? The learning problem?????????????ndnndbay.....2101022110115. For minimum squared error procedure based on Ya=b (Y is the matrix consisting of all the training samples), if we have proper b and criterion function, then this minimum squared error procedure might be equivalent to Fisher discriminant analysis. Is this presentation correct ?對于基于 Ya=b 的最小平方誤差方法,如果我們有合適的 b 和判別函數(shù),那么最小平方誤差方法就會和 Fisher 判別方法等價。這么說對嗎?答:中文書 198 頁,英文書 pdf 的 289 頁,章節(jié) 5.8.2。豆丁上的課件 16. Suppose that the number of the training samples approaches to the infinity, then the minimum error Bayesian decision will perform better than any other classifier achieving a lower classification error rate. Do you agree on this ?假設(shè)訓(xùn)練樣本的數(shù)目接近無窮,那么最小誤差貝葉斯決策會比其他分類器的分類誤差率更小。你同意這種觀點嗎?答:待定17. What are the upper and lower bound of the classification error rate of the K nearest neighbor classifier ?K 近鄰方法的分類誤差上界與下界是什么?答:不同 k 值的 k 近鄰法錯誤率不同, k=1 時為最近鄰法的情況(上、下界分別為貝葉斯錯誤率 P*和 ) 。當(dāng) k 增加時,上限逐漸靠近下限 ---貝葉斯錯誤率 P*。當(dāng) k*(2)1c?趨于無窮時,上下限重合,P= P*,此時 k 近鄰法已趨于貝葉斯決策方法達到最優(yōu)。The Bayes rate is p* , the lower bound on p is p* itself.The upper bound is about twice the Bayes rate.s118. Can you demonstrate that a statistics-based classifier usually cannot lead to a classification accuracy of 100% ?你能演示下基于統(tǒng)計的分類器不能導(dǎo)致 100%的準(zhǔn)確度嗎?19. What is representation-based classification? Please present the characteristics of representation-based classification.基于表征的分類是什么?請給出基于表征分類的特點?20. A simple representation-based classification method is presented as follows:一個簡單的基于表征的分類方法如下This method seeks to represent the test sample as a linear combination of all training samples and uses the representation result to classify the test sample:這個方法尋求使用訓(xùn)練樣本線性組合方法來表達測試樣本,而且使用表征結(jié)果來分類測試樣本:, (1) Mxby~.1??where ( ) denote all the training samples and ( ) are the ix2, ibM,.21?coefficients. We rewrite Eq.(1) into , (2) BXy~?where , . If is not singular, we can solve using TMb].[1]~[1Mx?XB; otherwise, we can solve it using yT)(?, (3) XIBT~1????where is a positive constant and is the identity matrix. After we obtain , we refer to I Bas the representation result of our method. We can convert the representation result into a two-Xdimensional image having the same size of the original sample image.We exploit the sum of the contribution, to representing the test sample, of the training samples from a class, to classify the test sample. For example, if all the training samples from the th ( ) class are , then the sum of the contribution, to representing the test rC?tsx~.sample, of the th class will be r. (4) tsraxg.~??We calculate the deviation of from usingrgy. (5)CyDrr??,||21We can also convert into a two-dimensional matrix having the same size of the original sample rgimage. If we do so, we refer to the matrix as the two-dimensional image corresponding to the contribution of the th class. The smaller the deviation , the greater the contribution to rDrepresenting the test sample of the th class. In other words, if ( ), the r rqmin?C?,test sample will be classified into the th class. qFrom the above presentation, we know that representation-based classification method is a novel method and totally different from previous classifiers ! It performs very well in image-based classification, such as face recognition and palmprint recognition. We should understand its nature and advantages. 21. Please describe the difference between linear and nonlinear discriminant functions? What potential advantage does nonlinear discriminant function have in comparison with linear discriminant function?請描述線性非線性判別函數(shù)的差別?非線性判別函數(shù)和線性判別函數(shù)比較有什么潛在的優(yōu)勢?答:I. 簡單的說線性判別函數(shù)就是其函數(shù)圖形是直線、平面,非線性判別函數(shù)則相反,函數(shù)圖形是曲線、曲面,不是直線、平面。II.在實際中有許多模式識別問題并不是線性可分的,應(yīng)采用非線性分類器進行設(shè)計。例如當(dāng)兩類樣本分布具有多峰性質(zhì)并互相交錯時,簡單的線性判別函數(shù)往往會帶來較大的分類錯誤。The above figure is just auxiliary for the question ! 122. What is the na?ve Bayes rule ?什么是樸素貝葉斯準(zhǔn)則答:樸素貝葉斯分類是一種十分簡單的分類算法,叫它樸素貝葉斯分類是因為這種方法的思想真的很樸素,樸素貝葉斯的思想基礎(chǔ)是這樣的:對于給出的待分類項,求解在此項出現(xiàn)的條件下各個類別出現(xiàn)的概率,哪個最大,就認為此待分類項屬于哪個類別。通俗來說,就好比這么個道理,你在街上看到一個黑人,我問你你猜這哥們哪里來的,你十有八九猜非洲。為什么呢?因為黑人中非洲人的比率最高,當(dāng)然人家也可能是美洲人或亞洲人,但在沒有其它可用信息下,我們會選擇條件概率最大的類別,這就是樸素貝葉斯的思想基礎(chǔ)。23. What is the difference between supervised and unsupervised learning methods? Please show two examples of supervised and unsupervised learning methods. 監(jiān)督學(xué)習(xí)方法和非監(jiān)督學(xué)習(xí)方法的差別是什么?請分別給出監(jiān)督學(xué)習(xí)方法和非監(jiān)督學(xué)習(xí)方法的例子?24. In some special real-world classification applications, the Bayesian decision theory might perform badly. What are possible reasons ?在一些特殊的真實世界分類的應(yīng)用中,貝葉斯決策理論可能表現(xiàn)很糟糕,可能的原因是什么?25. Suppose that we are applying a linear discriminant function to a nonlinear separable problem, what means can we adopt to obtain an optimal solution?假如我們將一個線性判別函數(shù)應(yīng)用到了一個非線性分割問題,為了獲得一個最優(yōu)解我們可以采取什么方法?26. Please present possible generalization capability in the sample space of a method. 請表達出在一個方法的樣本空間里的可能的泛化能力?27. Apply model Ya=b to perform classification.應(yīng)用 Ya=b 模型來實施分類。128. How to extend the binary minimum squared error procedure to the multiclass minimum squared error procedure? 怎么將 2 維最小平方誤差方法擴展到多維最小誤差平方方法?- 1.請仔細閱讀文檔,確保文檔完整性,對于不預(yù)覽、不比對內(nèi)容而直接下載帶來的問題本站不予受理。
- 2.下載的文檔,不會出現(xiàn)我們的網(wǎng)址水印。
- 3、該文檔所得收入(下載+內(nèi)容+預(yù)覽)歸上傳者、原創(chuàng)作者;如果您是本文檔原作者,請點此認領(lǐng)!既往收益都歸您。
下載文檔到電腦,查找使用更方便
10 積分
下載 |
- 配套講稿:
如PPT文件的首頁顯示word圖標(biāo),表示該PPT已包含配套word講稿。雙擊word圖標(biāo)可打開word文檔。
- 特殊限制:
部分文檔作品中含有的國旗、國徽等圖片,僅作為作品整體效果示例展示,禁止商用。設(shè)計者僅對作品中獨創(chuàng)性部分享有著作權(quán)。
- 關(guān) 鍵 詞:
- 哈爾濱工業(yè)大學(xué) 深圳 模式識別 2017 考試 重要 知識點 word
鏈接地址:http://ioszen.com/p-375612.html