Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Increase Max Num Files per Tablet

table.file.max

 


Info

This is not usually a problem. I'd suggest only doing this if you can verify that your tables are doing Merging Minor Compactions by looking in the logs or at the names of rfiles in HDFS. You can figure out the types of rfiles by running:

Code Block
hadoop fs -ls /accumulo/tables/<tableid>/* | grep rf | grep rf | awk '{n=split ($0,a,/\//);f=substr(a[n],0,1);print f}' | sort | uniq -c

 

 awk '{n=split ($0,a,/\//);f=substr(a[n],0,1);print f}' | sort | uniq -c

One attempt to count the merging minc files is:

Code Block
tableid="1td"
accumulo shell -u <user> -p <pass> -e "scan -np -t accumulo.metadata -b ${tableid} -e ${tableid}\xff -c file" | awk '{print $2}' | awk -F '/' '{print $8}' | grep 'M' | wc -l

The maximum number of files per tablet should be increased to ensure that a process known as merging minor compactions don't happen. Merging Minor Compactions take the smallest rfile on disk and merge it with the in-memory map to be minor compacted. This is very bad and can slow ingest. However, having fewer files on disk means that you have less to merge at scan time...adjusting the compaction ratio to force more major compactions (combining on disk small rfiles) can alleviate this without hurting ingest too much.

...

Example Config

Medium Cluster

Servers

80

Cores12 physical, 24 virtual
NIC2 - 10G bonded
Disks
 

Java Args-Xmx30G

Config:

Code Block
config -s table.cache.block.enable=true
config -s tserver.cache.data.size=12G

config -s tserver.compaction.minor.concurrent.max=9
config -s tserver.compaction.major.concurrent.max=8
config -s tserver.recovery.concurrent.max=2

config -s tserver.wal.blocksize=2G
config -s tserver.walog.max.size=2G
config -s tserver.mutation.queue.max=2M
config -s tserver.memory.maps.max=4G

config -s table.compaction.minor.logs.threshold=6
config -s table.file.max=30
 
config -s tserver.readahead.concurrent.max=32
config -s tserver.recovery.concurrent.max=4

Small Cluster, Large Servers

Servers4
Cores24 physical, 48 virtual
NIC

1 - 1 G

Disks

6 1TB, 2.5inch, unk RPM

Memory

256G

Java Args-Xmx80G

Config:

Code Block
config -s table.cache.block.enable=true
config -s tserver.cache.data.size=40G
 
config -s tserver.compaction.minor.concurrent.max=50
config -s tserver.compaction.major.concurrent.max=8
config -s tserver.recovery.concurrent.max=4
 
config -s tserver.wal.blocksize=2G
config -s tserver.walog.max.size=4G
config -s tserver.mutation.queue.max=2M
config -s tserver.memory.maps.max=12G
 
config -s table.compaction.minor.logs.threshold=10
config -s table.file.max=30