首頁  |   網站導覽  |   會員登入  |  
首頁 » 分析方法 » Kohonen自組織映射圖

Kohonen自組織映射圖
Kohonen Self-Organizing Maps, SOM
此方法為人工類神經網路(artificial neural network)的一種分群方法, 使用鄰近函數(neighborhood function)的概念來降低資料的維度(通常降至二維), 且透過視覺化的效果可藉由低維度的圖形來了解高維度的資料結構,是由芬蘭的Kohonen教授所提出,故又稱 Kohonen 映射。

分群模式-方法簡介

本方法使用之R相關套件與參考文獻:
相關套件:stats、base、SOMbrero、wordcloud、knitr、shiny
參考文獻(依套件名稱排序):
  1. R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL:http://www.R-project.org/.
  2. Villa-Vialaneix N., Bendhaiba L., Olteanu M. (2013) SOMbrero: SOM Bound to Realize Euclidean and Relational Outputs. R package version 0.4-1.
  3. Ian Fellows (2013). wordcloud: Word Clouds. R package version 2.4. URL: http://CRAN.R-project.org/package=wordcloud
  4. Yihui Xie (2013). knitr: A general-purpose package for dynamic report generation in R. R package version 1.5.
  5. Yihui Xie (2013) Dynamic Documents with R and knitr. Chapman and Hall/CRC. ISBN 978-1482203530
  6. RStudio and Inc. (2013). shiny: Web Application Framework for R. R package version 0.8.0. URL: http://CRAN.R-project.org/package=shiny
範例F-6:

鳶尾花(iris)資料,最早由英國統計學家費雪(R. A. Fisher, 1890 – 1962)用於多變量分析(multivariate analysis)中的判別分析(discriminant analysis),故常稱為費雪鳶尾花資料。此資料是由美國植物學家安德生(E. S. Anderson, 1897 – 1969)所收集,故也稱為安德生鳶尾花資料。此資料記錄了鳶尾花三個亞種及其特徵,三亞種分別為山鳶尾(setosa)、變色鳶尾(versicolor)及維吉尼亞鳶尾(virginica),花的特徵則包含花萼(sepal)與花瓣(petal)的長度與寬度。

表:鳶尾花資料
變數名稱 花萼長度 花萼寬度 花瓣長度 花瓣寬度 品種
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
150 5.9 3.0 5.1 1.8 virginica

Q1:資料中記錄有花萼與花瓣的長度與寬度以及花的亞種,植物學家想了解,若記錄的資料中僅有花的特徵(花萼與花瓣的長度與寬度),能否以此資料分辨出花的亞種數目?
統計方法:問題中想利用花萼與花瓣的長度與寬度來了解該花種的亞種數目,而亞種個數為未知,故適合使用具有分群能力的分析方法。具有分群能力的方法歸類於集群分析(clustering analysis)中,以下列出適用於此鳶尾花資料的方法,使用者可比較在相同的目的下,不同分析方法所獲得結果的差異性。

分群模式 集群分析 K組平均集群法(K-Means Clustering)
模糊理論集群法(Fuzzy Clustering)[包含C-Means方法]
階層分群法(Hierarchical Clustering)
自組織映射圖(Kohonen SOM)
Kohonen自組織映射圖 - 分析結果
  • 分析方法:Kohonen自組織映射圖
  • 資料名稱:範例F-6
  • 變數名稱:Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
  • 計算時間:3.403秒

  • 樣本敘述統計量I
    變數名稱
    Variable
    樣本數
    Count
    平均數
    Mean
    中位數
    Median
    最小值
    Minimum
    最大值
    Maximum
    標準差
    Std. dev.
    Sepal.Length1505.84335.84.37.90.8281
    Sepal.Width1503.0573324.40.4359
    Petal.Length1503.7584.3516.91.7653
    Petal.Width1501.19931.30.12.50.7622
    I:樣本敘述統計量皆不包含遺失值

  • 模式訊息:
    SOM模式 online
    SOM型式 numeric
    SOM結構 X軸維度5、Y軸維度5
    最大迭代次數 500
    距離測量方式 letremy
    中間備用數(intermediate backups) 0
    初始化標準(Initializing prototypes) random
    資料預處理型式(pre-processing type) unitvar
    鄰居型式(Neighbourhood type) letremy

  • 模式變異數分析表I
    變數名稱
    variable
    F-統計量
    F-value
    P-值
    p-value
    顯著性
    significance
    Sepal.Length 76.1410***
    Sepal.Width 78.6090***
    Petal.Length242.5400***
    Petal.Width140.6870***
    I:表中變數顯著表該變數為重要的分群變數

  • SOM模式品質指標:
    地形測量(topographic) 0.0733
    量化(quantization) 0.2086

  • 起始群表現標準的距離矩陣:(下載完整CSV檔)
    12345678910111213141516171819202122232425
    10.00000.38830.75161.89352.97040.19811.11281.44632.87743.37942.25443.48823.29413.45263.97245.35144.42443.90574.03504.24725.50124.77734.09273.93504.3278
    20.38830.00000.37491.80862.94960.58461.26131.43252.91953.40742.41093.60843.38833.53014.04975.50404.55244.02264.12564.33085.64574.91654.22944.04914.4126
    30.75160.37490.00001.68852.86160.94501.39561.40052.89113.35782.50853.65073.40523.52644.04165.56504.59604.05604.13034.32795.69614.96604.27974.07844.4130
    41.89351.80861.68850.00001.18011.95411.12720.51651.29681.71431.41032.18991.89091.96132.44654.08293.09692.54832.55722.73364.18773.46882.80802.55972.8220
    52.97042.94962.86161.18010.00002.98851.97201.52910.48010.61991.40661.35371.02130.97851.37063.07722.11421.59151.50351.64263.15362.47061.88031.58971.7354
    60.19810.58460.94501.95412.98850.00001.07001.48152.86483.37242.18303.42883.25023.41833.93675.27154.35773.84783.99134.20615.42664.70654.02533.87994.2846
    71.11281.26131.39561.12721.97201.07000.00000.62631.80812.32491.15502.37622.18482.35522.87544.25423.31572.79562.92973.14204.40453.67752.99252.83043.2192
    81.44631.43251.40050.51651.52911.48150.62630.00001.49561.97991.24222.27152.01032.12642.64254.18663.21492.66502.73002.92794.31003.58152.89782.68513.0172
    92.87742.91952.89111.29680.48012.86481.80811.49560.00000.53021.03310.98040.64230.66691.15872.80351.83771.27381.26271.44762.89802.18681.54051.27591.5514
    103.37943.40743.35781.71430.61993.37242.32491.97990.53020.00001.51070.98630.68640.49950.78822.52291.61701.08970.94621.08822.57451.90511.34101.04391.2184
    112.25442.41092.50851.41031.40662.18301.15501.24221.03311.51070.00001.25551.13751.36711.85393.10812.17541.68271.86902.08243.27222.54651.87301.73692.1410
    123.48823.60843.65072.18991.35373.42882.37622.27150.98040.98631.25550.00000.35250.54300.75491.93190.95130.44850.68690.87902.08561.35980.72080.56240.9112
    133.29413.38833.40521.89091.02133.25022.18482.01030.64230.68641.13750.35250.00000.29500.74102.20601.23010.66930.75890.96512.32951.60880.95700.72061.0457
    143.45263.53013.52641.96130.97853.41832.35522.12640.66690.49951.36710.54300.29500.00000.53732.16321.23220.66150.61420.81612.24941.55070.93120.65330.9468
    153.97244.04974.04162.44651.37063.93672.87542.64251.15870.78821.85390.75490.74100.53730.00001.74400.90970.49740.19640.32771.79211.13700.68210.40990.5127
    165.35145.50405.56504.08293.07725.27154.25424.18662.80352.52293.10811.93192.20602.16321.74400.00001.02311.53831.58851.47090.31930.63901.30511.53771.3872
    174.42444.55244.59603.09692.11424.35773.31573.21491.83771.61702.17540.95131.23011.23220.90971.02310.00000.59130.72640.67581.20250.54540.53250.66640.5524
    183.90574.02264.05602.54831.59153.84782.79562.66501.27381.08971.68270.44850.66930.66150.49741.53830.59130.00000.34680.50401.66430.94270.34800.19180.5464
    194.03504.12564.13032.55721.50353.99132.92972.73001.26270.94621.86900.68690.75890.61420.19641.58850.72640.34680.00000.22781.65940.98800.53700.29620.3926
    204.24724.33084.32792.73361.64264.20613.14202.92791.44761.08822.08240.87900.96510.81610.32771.47090.67580.50400.22780.00001.52640.90270.63040.46560.2286
    215.50125.64575.69614.18773.15365.42664.40454.31002.89802.57453.27222.08562.32952.24941.79210.31931.20251.66431.65941.52640.00000.73421.41761.63131.4718
    224.77734.91654.96603.46882.47064.70653.67753.58152.18681.90512.54651.35981.60881.55071.13700.63900.54540.94270.98800.90270.73420.00000.69210.91310.8586
    234.09274.22944.27972.80801.88034.02532.99252.89781.54051.34101.87300.72080.95700.93120.68211.30510.53250.34800.53700.63041.41760.69210.00000.30300.6786
    243.93504.04914.07842.55971.58973.87992.83042.68511.27591.04391.73690.56240.72060.65330.40991.53770.66640.19180.29620.46561.63130.91310.30300.00000.5625
    254.32784.41264.41302.82201.73544.28463.21923.01721.55141.21842.14100.91121.04570.94680.51271.38720.55240.54640.39260.22861.47180.85860.67860.56250.0000

  • 觀察值於拓樸矩陣中的分布:
    分布表(下載完整CSV檔)
    樣本點矩陣格點編號
    12
    23
    33
    43
    51
    14622
    14720
    14817
    14917
    15018

    熱感應圖I

    I:拓樸矩陣格點編號順序由左下第一格點至左上第一格點,再由左下第二格點至左上第二格點

    線圖


    條狀圖


    樣本編號圖


    雷達圖


    彩色圖


    盒鬚圖


  • 自變數於拓樸矩陣格點中的表現標準:
    分布表(下載完整CSV檔)
    矩陣格點編號Sepal.LengthSepal.WidthPetal.LengthPetal.Width
    15.25403.69381.48410.2535
    24.98613.42391.40560.2486
    34.76163.12761.44950.2283
    45.08662.57982.88470.8493
    55.37122.33933.93581.2353
    217.28262.98636.17362.1656
    226.94213.02515.55071.9823
    236.67283.06564.99661.6696
    246.45932.89294.87921.6187
    256.20402.72295.23801.9246

    線圖


    條狀圖


    雷達圖


    彩色圖


    3D圖


  • 拓樸矩陣中格點表現標準距離:
    平滑距離圖


    多邊形分布圖


    U矩陣圖


    多元尺度比例圖


    格點圖


  • 分群模式訊息:
    分群方法 ward
    起始分群數(同拓樸矩陣格點數) 25
    最後分群數 3

  • 分群個數表:
    群編號123
    群內格點數8134

  • 格點分群表:
    格點編號1234567891011121314151617181920
    群編號11112111221222233222
    格點編號2122232425
    群編號33222

  • 格點分群圖:
    熱感應圖


    線圖


    條狀圖


    盒鬚圖


    雷達圖


    格點圖


    彩色圖


    多邊形分布圖


    多元尺度比例圖


    樹狀圖


    3D樹狀圖


[重新分析]
影音教學內容為本系統資料處理與分析方法之操作說明,
可供使用者即時參考及線上自學,
輕鬆上手「R資料分析暨導引系統」!

步驟一:資料匯入
選擇要進行分析的資料檔或上傳檔案
您所選擇的資料檔為: