使用 Python 的計數器計算列表中每個元素的出現次數

在 Python 中，可以使用內置函數 len() 獲取列表或元組中所有元素的個數，使用 count() 方法可以獲取每個元素的個數（每個元素出現的次數） .

此外，可以使用 Python 標準庫集合的 Counter 類來按出現次數的順序獲取元素。

在本節中，我們將討論以下內容

計算元素總數：len()
統計每個元素的個數（每個元素出現的次數）：count()
用法。collections.Counter
按出現頻率的順序檢索元素：most_common()
計算非重疊元素（唯一元素）的數量（類型）。
計算滿足條件的元素數。

另外，作為一個具體的例子，下面用示例代碼進行說明。

計算一個詞在一個字符串中出現的次數。
計算字符串中某個字符出現的次數。

示例是一個列表，但可以對元組進行相同的處理。

Table of Contents

統計元素總數：len()
統計每個元素的個數（每個元素出現的次數）：count()方法
如何使用 collections.Counter
按出現頻率順序獲取元素：most_common() 方法
計算非重疊元素（唯一元素）的數量（類型）。
計算滿足條件的元素數。
計算一個詞在一個字符串中出現的次數。
計算字符串中某個字符出現的次數。

統計元素總數：len()

要計算列表或元組中元素的總數，請使用內置函數 len()。

l = ['a', 'a', 'a', 'a', 'b', 'c', 'c']

print(len(l))
# 7

統計每個元素的個數（每個元素出現的次數）：count()方法

要計算每個元素的數量（每個元素出現的次數），請對列表、元組等使用 count() 方法。

Common Sequence Operations — Built-in Types — Python 3.10.0 Documentation

如果將不作為元素存在的值作為參數傳遞，則返回 0。

l = ['a', 'a', 'a', 'a', 'b', 'c', 'c']

print(l.count('a'))
# 4

print(l.count('b'))
# 1

print(l.count('c'))
# 2

print(l.count('d'))
# 0

如果你想一次得到每個元素出現的次數，下面的 collection.Counter 很有用。

如何使用 collections.Counter

Python 標準庫集合有一個 Counter 類。

collections – Counter — Container datatypes — Python 3.10.0 Documentation

Counter() 是字典類型 dict 的子類，它以元素形式的數據作為鍵，以出現的形式作為值。

import collections

l = ['a', 'a', 'a', 'a', 'b', 'c', 'c']

c = collections.Counter(l)
print(c)
# Counter({'a': 4, 'c': 2, 'b': 1})

print(type(c))
# <class 'collections.Counter'>

print(issubclass(type(c), dict))
# True

如果將元素指定為鍵，則可以獲得元素的數量。如果指定了作為元素不存在的值，則返回 0。

print(c['a'])
# 4

print(c['b'])
# 1

print(c['c'])
# 2

print(c['d'])
# 0

您還可以使用字典類型的方法，例如 keys()、values()、items() 等。

print(c.keys())
# dict_keys(['a', 'b', 'c'])

print(c.values())
# dict_values([4, 1, 2])

print(c.items())
# dict_items([('a', 4), ('b', 1), ('c', 2)])

這些方法返回 dict_keys 等類型的對象。如果您想運行 for 語句，它們可以按原樣使用。如果要將其轉換為列表，請使用 list()。

按出現頻率順序獲取元素：most_common() 方法

Counter 有 most_common() 方法，該方法返回按出現次數排序的形式（元素，出現次數）的元組列表。

print(c.most_common())
# [('a', 4), ('c', 2), ('b', 1)]

出現次數最多的元素可以通過指定一個索引來獲取，例如[0]表示出現次數最多，[-1]表示出現次數最少。如果只想獲取元素或僅獲取出現次數，則可以進一步指定索引。

print(c.most_common()[0])
# ('a', 4)

print(c.most_common()[-1])
# ('b', 1)

print(c.most_common()[0][0])
# a

print(c.most_common()[0][1])
# 4

如果要按出現次數遞減的順序對它們進行排序，請使用增量設置為 -1 的切片。

print(c.most_common()[::-1])
# [('b', 1), ('c', 2), ('a', 4)]

如果為 most_common() 方法指定了參數 n，則僅返回出現次數最多的 n 個元素。如果省略，則為所有元素。

print(c.most_common(2))
# [('a', 4), ('c', 2)]

如果你想要一個單獨的按出現次數排序的元素/出現的列表，而不是一個（元素，出現次數）的元組，你可以分解如下

values, counts = zip(*c.most_common())

print(values)
# ('a', 'c', 'b')

print(counts)
# (4, 2, 1)

內置函數 zip() 用於轉置二維列表（在本例中為元組列表），然後對其進行解包和提取。

計算非重疊元素（唯一元素）的數量（類型）。

要計算列表或元組中有多少非重疊元素（唯一元素）（有多少類型），請使用 Counter 或 set() 如上所述。

Counter 對像中的元素個數等於原始列表中非重疊元素的個數，可以通過 len() 獲取。

l = ['a', 'a', 'a', 'a', 'b', 'c', 'c']
c = collections.Counter(l)

print(len(c))
# 3

您還可以使用 set()，set 類型 set 的構造函數，如果您不需要 Counter 對象，這會更容易。

集合類型是一種沒有重複元素的數據類型。將列表傳遞給 set() 會忽略重複值並返回一個 set 類型的對象，該對象僅具有唯一值作為元素。這種類型的元素數量由 len() 獲得。

print(set(l))
# {'a', 'c', 'b'}

print(len(set(l)))
# 3

計算滿足條件的元素數。

要計算滿足特定條件的列表或元組中的元素數量，請使用列表理解符號或生成器表達式。

例如，計算以下數字列表中具有負值的元素數

l = list(range(-5, 6))
print(l)
# [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5]

將條件表達式應用於列表推導式表示法中的每個元素會產生一個列表，其元素是布爾布爾值 (true, false)。布爾類型bool是整數類型int的子類，其中true被視為1，false被視為0。因此，可以通過sum計算求和來統計出真值的個數（滿足條件的元素個數） ().

print([i < 0 for i in l])
# [True, True, True, True, True, False, False, False, False, False, False]

print(sum([i < 0 for i in l]))
# 5

如果我們用 () 替換列表理解符號中的 []，我們得到一個生成器表達式。列表理解表示法生成所有處理過的元素的列表，而生成器表達式按順序處理元素，因此內存效率更高。

當生成器表達式是唯一的參數時， () 可以省略，因此可以寫成後一種情況。

print(sum((i < 0 for i in l)))
# 5

print(sum(i < 0 for i in l))
# 5

如果要統計false值的個數（不滿足條件的元素個數），用not。請注意 >具有比 not 更高的優先級（首先計算），因此以下示例中的 (i < 0) 中的括號 () 不是必需的。

print([not (i < 0) for i in l])
# [False, False, False, False, False, True, True, True, True, True, True]

print(sum(not (i < 0) for i in l))
# 6

當然，條件本身是可以改變的。

print(sum(i >= 0 for i in l))
# 6

下面顯示了一些其他示例。

獲取數字列表的奇數元素數量的示例。

print([i % 2 == 1 for i in l])
# [True, False, True, False, True, False, True, False, True, False, True]

print(sum(i % 2 == 1 for i in l))
# 6

字符串列表的條件示例。

l = ['apple', 'orange', 'banana']

print([s.endswith('e') for s in l])
# [True, True, False]

print(sum(s.endswith('e') for s in l))
# 2

Counter 用於根據出現的次數進行計數。 items() 檢索（元素，出現次數）的元組，出現次數指定條件。

下面是提取出現兩次或多次的元素並統計出現總數的例子。在這個例子中，有四個 a 和兩個 c，總共六個。

l = ['a', 'a', 'a', 'a', 'b', 'c', 'c']
c = collections.Counter(l)

print(c.items())
# dict_items([('a', 4), ('b', 1), ('c', 2)])

print([i for i in l if c[i] >= 2])
# ['a', 'a', 'a', 'a', 'c', 'c']

print([i[1] for i in c.items() if i[1] >= 2])
# [4, 2]

print(sum(i[1] for i in c.items() if i[1] >= 2))
# 6

下面是提取出現兩次或多次的元素類型並統計出現次數的例子。在這個例子中，有兩種類型，a 和 c。

print([i[0] for i in c.items() if i[1] >= 2])
# ['a', 'c']

print([i[1] >= 2 for i in c.items()])
# [True, False, True]

print(sum(i[1] >= 2 for i in c.items()))
# 2

計算一個詞在一個字符串中出現的次數。

作為一個具體的例子，讓我們計算一個單詞在一個字符串中出現的次數。

首先，使用replace() 方法將不需要的逗號和句點替換為空字符串，然後將其刪除。然後，使用 split() 方法創建一個由空格分隔的列表。

s = 'government of the people, by the people, for the people.'

s_remove = s.replace(',', '').replace('.', '')

print(s_remove)
# government of the people by the people for the people

word_list = s_remove.split()

print(word_list)
# ['government', 'of', 'the', 'people', 'by', 'the', 'people', 'for', 'the', 'people']

如果可以做個list，可以得到每個詞出現的次數，出現的詞的類型，以及collections.Counter的most_common()得到出現次數最多的詞。

print(word_list.count('people'))
# 3

print(len(set(word_list)))
# 6

c = collections.Counter(word_list)

print(c)
# Counter({'the': 3, 'people': 3, 'government': 1, 'of': 1, 'by': 1, 'for': 1})

print(c.most_common()[0][0])
# the

以上是一個非常簡單的過程，因此對於更複雜的自然語言處理，最好使用NLTK等庫。

Natural Language Toolkit — NLTK 3.6.5 documentation

此外，對於日語文本，由於沒有明確的分詞，因此無法使用 split() 來拆分文本。例如，您可以使用 Janome 庫來實現這一點。

計算字符串中某個字符出現的次數。

由於字符串也是序列類型，它們可以與 count() 方法一起使用或作為參數傳遞給 collections.Counter() 的構造函數。

s = 'supercalifragilisticexpialidocious'

print(s.count('p'))
# 2

c = collections.Counter(s)

print(c)
# Counter({'i': 7, 's': 3, 'c': 3, 'a': 3, 'l': 3, 'u': 2, 'p': 2, 'e': 2, 'r': 2, 'o': 2, 'f': 1, 'g': 1, 't': 1, 'x': 1, 'd': 1})

檢索前 5 個最常出現的字符的示例。

print(c.most_common(5))
# [('i', 7), ('s', 3), ('c', 3), ('a', 3), ('l', 3)]

values, counts = zip(*c.most_common(5))

print(values)
# ('i', 's', 'c', 'a', 'l')