Glossary

Quantization

Quantization stores a model's weights at lower numeric precision (for example 4-bit instead of 16-bit), making it much smaller and faster with only a small quality loss. The widely-cited Q4_K_M sweet spot is roughly 75% smaller for about 3% quality loss, which is what lets a 27B model fit on a consumer GPU or a 32 GB Mac. A rule of thumb: a 4-bit model needs about half its parameter count in gigabytes of RAM.

← All terms

pdf2okf.com

Be there when it opens.

pdf2okf is in private build, self-hosted, sovereign. Leave an email and you'll be first in.