貝氏定理
將「給定 X 事件已發生的前提下,Y 事件發生的條件機率」轉變成「給定 Y 事件已發生的前提下,X 事件發生的條件機率」的過程而已。
貝氏定理就是在算反機率。

Naive Bayes
根據機率預測的演算法之一,應用於分類問題
計算出個筆資料屬於某種標籤的機率,再將其歸類至機率最高的標籤
Extracting Features From Text Data
There are two approaches to convert unstructured data to a structured form.
1. Bag of Words Model (Bow 詞袋)
The model can be represented as a table containing the frequency of the words and the words themselves.

2. Vector Space Model
a. Term Document Matrix

- 將每個單詞視為一個特徵,頻率計數充當特徵/單詞的“強度”。
- 大多數 NLP 庫都有一個內置的常見停用詞列表。
停用詞是在整個語言中足夠常見的單詞,通常可以安全地刪除它們並且不認為它們很重要。
b. Computing Terms’ Weights
There are various approaches for determining the terms’ weights. The simple and frequently used approaches include:-
-
Binary weights
-
Term Frequency (TF)