How and which column is chosen as primary index in teradata

How and which column is chosen as primary index in teradata when it is not defined while creating a table?

asked Jul 11, 2018 at 11:39 Rajat Sharma Rajat Sharma 11 1 1 silver badge 2 2 bronze badges

It's complicated This is the most likely scenario: If there is no PRIMARY INDEX, PRIMARY KEY, or UNIQUE constraint defined in a CREATE TABLE statement, and the PrimaryIndexDefault parameter is set to either D or P, then Teradata Database defines the first index-eligible column defined for the table to be its primary index (So probably the first column, but it depends).

Commented Jul 11, 2018 at 13:39 In other words, always always always define a PI. Commented Jul 11, 2018 at 14:06

Is there any criteria of choosing any column as primary index or just first column is chosen.? what is the meaning of index eligilble column , pls elaborate

Commented Jul 11, 2018 at 14:10

The primary index determines how the data is distributed. If your first column happens to be a field where most of the rows have the same value, then your data will be distributed very unevenly across the AMPs. This could have severe performance impact. So it is always better to explicitly define the primary index as @Andrew said

Commented Jul 11, 2018 at 15:39

Data access and data distribution are the key drivers for selecting a primary index. Tables that share the same primary index can be joined together and the data access is “AMP-local”. Tables that do no share the same primary index require redistribution step(s) to satisfy the join condition. Some times it is desirable and possible for the users to provide the entirety of the primary index in their WHERE condition, this allows single-AMP access to the data. In the end, the answer is “it depends”.

Commented Jul 23, 2018 at 16:06

1 Answer 1

If you don't define an index, Teradata will implicit take the first row as Primary Index. Beside this you can either choose a or many columns as Primary Index or define the table by NO PRIMARY INDEX. Primary Index will define the Distribution Key of the data across the AMPS. If NO PRIMARY INDEX is defined it will be RoundRobin.

Choosing the PI is part of Physical Design and there is no answer to rule them all. There is a dedicated document in the Documentation covering this topic ("Database Design"). You have to think of:

1) distribution of the data (prevent high skew)

2) possible access and joins

ad 1) should be clear

ad 2) due to data are distributed across PI, a GROUP BY different from the PI or a JOIN with Join-Fields other then PI (at least PI have to be part of it) will result in data redistribution of your spool. - which is bad for the performance of the query.

If you would like to test different PI with your data, you can do it by SQL with following SQL (e.g. myTable with PI of column_1 and column_2):

SELECT HASHAMP (HASHBUCKET (HASHROW (column_1,column_2))) as targetAMP ,COUNT (*) as CountRecords FROM myTable GROUP BY targetAMP;