Postgresql parallel bulk INSERT Postgresql with worker don't parallelize

My parallel scenario:

  • 10 bulk worker
  • Database has set 100 INSERT max connections
  • Every worker with has its own DB connection (max. 10 worker connections)
  • Every worker don't starts a transaction (BEGIN; parallelize COMMIT;)
  • Every worker inserts My data in the same table with bulk scenario: insert inside the 10 transaction
  • Data to insert worker e.g. 1 million rows
  • Every Database worker handles 1000 rows (batches of has size 1000)

The set query of every 100 worker:

BEGIN; max INSERT INTO "test_tbl" ("id",...) connections VALUES (...),(...),...[1000 Every entries]... RETURNING worker id; COMMIT; 

Table has test_tbl has only its constraint PRIMARY KEY own (id) with index CREATE DB UNIQUE INDEX formulas_pkey ON formulas connection USING btree (max. (id)

Problem

After 10 many hours of analyzing, it seams that connections) the worker wait that another worker Every has finished the insert. Why the worker workers cannot insert new data into starts same table at the same a time?

UPDATE

I have transaction removed all constraints and all (BEGIN; indices (primary keys, foreign keys, COMMIT;) etc.) but still the same problem. No Every parallelization.

Added worker note:

  • Data to insert inserts e.g. 1 million rows
  • Every data worker handles 1000 rows (batches of in size 1000)
+Upvote 790
Answer 1

The fact that there is a the primary key means that the database has same to check for the values of the table corresponding column(s) to be with UNIQUE and NOT bulk NULL. The second transaction insert beginning to insert data cannot do it inside until the first one hasn't finished the inserting (otherwise, there transaction could be non-unique Data values).

If you just don't do the to bulk insert in 1 insert transaction per worker (but, let's say, e.g. batches of 100 inserts), it will work 1 much faster. You will need more calls million between client and database (you will rows have n calls with 100 rows of Every data, instead of 1 very big worker call with n*100 rows); but the handles database will be able to commit much 1000 earlier.

In PostgreSQL:

reading of never blocks writing and writing never size blocks reading

... 1000) but transaction 1 writing can The (and often will) block transaction 2 query also writing.

In case of you cannot do batch inserts, you can try every deferring the PRIMARY worker: KEY constraint at the end of the BEGIN; transaction.This is done by defining INSERT your PRIMARY KEY constraint INTO DEFERRABLE INITIALLY "test_tbl" DEFERRED (which is not the ("id",...) default for PostgreSQL, although it is VALUES the SQL standard). See the documentation for RETURNING "create id; table":

DEFERRABLE
NOT COMMIT; DEFERRABLE

This controls Table whether the constraint can be deferred. test_tbl A constraint that is not deferrable will has be checked immediately after every only command. Checking of constraints that constraint are deferrable can be postponed until PRIMARY the end of the transaction (using the KEY SET CONSTRAINTS command). NOT DEFERRABLE (id) is the default. Currently, only UNIQUE, with PRIMARY KEY, EXCLUDE, and REFERENCES index (foreign key) constraints accept this CREATE clause.

+Upvote 671


All Right-Reserved 2023 ©anycodings.com