Quickly Counting and Sorting Word Use

21. April 2012, 08:29

This is just a quick entry showing how to count the number of times each word is used in a file, sort them by that usage, and print them out in ascending popularity. We use the C++11 standard here for a quick lambda function:

#include <map>
#include <vector>
#include <string>
#include <iostream>
#include <algorithm>
using namespace std;
int main() {
  std::map<string, int> m;
  std::string word;
  while (cin >> word) {
    m[word]++;
  }
  std::vector<std::pair<string, int>> v(m.begin(), m.end());
  std::sort(v.begin(), v.end(),
      [&](std::pair<string, int> a, std::pair<string, int> b) {
      return a.second < b.second;});
  for (auto I = v.begin(); I != v.end(); ++I) {
      std::cout<<I->first<<'\t'<<I->second<<'\n';
  }
  return 0;
}

If you don’t like lambda function, or if your compiler doesn’t support them, then instead of this:

[&](std::pair a, std::pair b) {
      return a.second < b.second;});

Define a function outside of main:

bool compUsage(std::pair<string, int> a, std::pair<string, int> b) {
  return a.second < b.second;
}

And use that function instead of the lambda function:

  std::sort(v.begin(), v.end(), compUsage);

The program expects a file from standard input. If you use linux or mac you can do this:
cat file | ./program
or
./program < file

If you have Windows I don’t know how to do this! Sorry!

Ben Firner

---

Comment

Commenting is closed for this article.

---