FutureComputing4AI/KiloGrams

KiloGram algorithm for finding the top-k most frequent n-grams for large values of n quickly with fixed memory.

38
/ 100
Emerging

This tool helps cybersecurity researchers and data scientists efficiently extract the most frequent N-grams from large collections of files, like malware samples or software executables. You input paths to folders containing your files (e.g., 'goodware' and 'malware'), and it outputs a dataset in a format like libsvm, ready for machine learning analysis. It's designed for those building machine learning models for file classification, particularly in cybersecurity.

No commits in the last 6 months.

Use this if you need to generate features for machine learning models by identifying the most common very large N-grams from extensive file datasets, especially for malware analysis or binary classification tasks.

Not ideal if you require a user-friendly application with a graphical interface or if you need ongoing support and warranty for production systems.

malware-analysis cybersecurity-research feature-engineering binary-classification data-preparation
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

9

Forks

4

Language

Java

License

Apache-2.0

Last pushed

Oct 08, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/FutureComputing4AI/KiloGrams"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.