performance - Are queries against Azure Table Storage indexed when using a partial RowKey? -


i understand ms pdc presentations partitionkey used load balance table across multiple servers, nobody seems give advice on whether partitionkey used index within single server.

likewise, tell specifying partitionkey , rowkey gets great performance, nobody seems tell if rowkey being used improve performance within partitionkey.

here sample queries me frame questions. assume entire table contains 100,000,000 rows.

  1. partionkey="123" , otherfield="def"
  2. partitionkey="123" , rowkey >= "aaa" , rowkey < "aac"

here questions:

  • if have 10 rows in each partition, query 1 fast?
  • if have 1,000,000 rows in each partition, query 2 fast?

in ats, partitionkey used distribution lookup, not index. level of working ats, consider partitionkey , "server"/node share 1:1 relationship. (behind scenes isn't true, concepts such optimizing partitionkeys happen reside on same physical/virtual node abstracted several levels consumer of azure has deal with. details purely internal overall azure infrastructure , in case of ats, best assume optimal can ... aka "dont worry it")

in context of dbms vs ats, rowkey closest thing "index" in assists in finding data across similar node. directly answer 1 of question, rowkey index within partitionkey.

stepping outside box bit, however, partitionkey can give perf gains closer how think of traditional index, because of distributed nature of how data spread across ats nodes. should optimize layout 1st partitionkey, rowkey. (aka, if have 1 keyable value, make partkey)

in general rule, queries going perform in order, efficient least efficient

1. partitionkey=x , rowkey=y (and otherprop = z)

because lookup gets right node , indexed prop on partition

2. partitionkey=x (and otherprop =z)

because proper node, ats equvi. of full table scan

3. otherprop = z

because have partition scan, table scan


with that, direct questions

  1. i don't feel can answered. subjective (ie "what fast?"). slower query2, 10 rows "slowness" milliseconds if even

  2. (similar theme) faster query 1. anytime can query2, should

so explaination , questions, real answer comes down how architect usage of ats.

based on data set (both current , expected growth) need determine proper scheme can partition , row fastest way possible. knowing how lookup occurs, can make logical decisions path going there fast enough, more parts, less rows -vs- less parts, more rows, etc


Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

c# - How to add a new treeview at the selected node? -

java - netbeans "Please wait - classpath scanning in progress..." -