You are here:

Ingres/10 tables query 2

Advertisement


Question
Hello.
I've seen that for queries with less than nine tables involved in, the QEP begins with this sentence:
QUERY PLAN 1,1, no timeout, of main query
Then, I increase the number of tables involved, and when the query reaches nine tables, the QEP shows this:
QUERY PLAN 13,1, timed out, of main query
I can turn off the optimizer timeout with 'set joinop notimeout', but when I run 'set joinop timeout', the optimizer comes back to that situation of the nine tables I told.

Is there one configuration parameter which rules it? Or what?

Thanks.
-
------------------------
Followup To
Question -
Bon jour, Jean-Pierre,
Sorry, I don't understand the 'correct indices' part. Could you explain it a little more or, perhaps, add an example?

Another question. I've see that the optimizer timeout 'disappear' as I change the number of tables involved in the query (when I get off tables). What can be the cause of this?

Thanks.   
-------------------------
Followup To
Question -
Have you any idea about how to improve the performance of a ten tables based query (with k-joins)?
Answer -
Hi Miguel,

Your question is in FoxPro folder, and I'am an Ingres expert. Hope this question is for Ingres ... :-)

You can use QEP (Query Execution Plan). Here is how you can turn it on :
sql your_db
set qep
\g
set optimizeonly
\g
your bif select here
\g

Performance can be improved by :
--------------------------------
- correct indices
- appropriates statistic data for the DBMS (taken by optimizedb command)
- reorganisation ad'hoc

Correct indices :
-----------------
When you join 2 or more tables, take care of the correct where clause. Columns are to be joined from the left to the right (ie if your key is col1, col2 and col3 the correct where clause will involve col1 and col2 but not col2 and col3 [miss col1 in that case)

Appropriate statistics :
------------------------
A good optimizedb command is :
optimizedb -zh -zk mydb
-zh to see histograms
-zk to ask optimizedb to take stats on key columns only

Reorganisation ad'hoc :
-----------------------
On regular basis, take care to pass a MODIFY on each table and perhaps a CREATE INDEX (because sometimes, depend on create index option, modify delete indices).

A good regular administrative task on Ingres can be :
1/ modify and create index (to eliminate hole in data pages)
2/ optimizedb to take correct stats
3/ sysmod to reoganize system data tables (destructured by optimizedb)

Hope this help,
Jean-Pierre Zuate
Answer -
Correct indices ...

When you access data through a SGBD/R, you "only" do join between columns. If you want your query to speed up, indices must exist when you run the query.

A short example ...

Suppose two tables, customer and bill.
Customer table look like this :
customer_number
customer_name,
customer_address

Bill table look like this :
bill_number
customer_number,
quantity_sum,
amount_sum

If, from the bill table, you need to know what are all the bill for one customer the query look like this :
select *
from bill
where customer_number = 'the requested number'

Correct indice mean if you don't have an object (an index in fact) well designed, you query can be very slow. In my example, the correct index is an index on the table bill for the column customer_number. In SQL something like :
CREATE INDEX bill_idx on bill (customer_number) with structure = hash

Performance depend on the pertinence of the columns involved in the index, the structure, and by all the parameter linked to the structure.

In Ingres, you have some differents structures, depend on the usage :
- heap : no sort here, no key. Fast for massive insert
- btree : good for partial key query. Very good if table have many update an few window for modify
- isam : same service as btree, but need to be reorganised (more) regulary
- hash : good for exact query (where each column of the key have an unique value in the query)

Now, my turn to have a question : what do you call "optimizer timeout" and how do you see it ?

JPz :-)

Answer
OK, I see :-)

In Ingres the optimizer look the amount of time it spend to build the QEP vs the lowest estimated execution time for the already estimated QEP (for one query).

So : QUERY PLAN 1,1, no timeout, of main query
mean it spend less time to explore all the solutions (build distinct QEP and choose the best one) than the time for the query itself.

and : QUERY PLAN 13,1, timed out, of main query
mean it spend one second more the lower time it found and then make the less bad choice.

the parameter your are looking for is opf_timeout_abort (CBF / Server Parameters / Configure). Let see cbf's help about it :
The opf_timeout_abort resource specifies the time in
millesconds after which the optimizer will stop
considering query plans and use the best plan found to
that point. If no plan has been found yet, an error is
generated.         
If 0 (the default) is specified, this timeout is disabled.  

or
opf_joinop_timeout
The opf_joinop_timeout resource specifies the time in
millesconds after which the optimizer will stop
considering new query plans and use the best plan found to
that point.
If 0 is specified, this timeout will occur when the
optimizer finds that the best plan found so far will take
less time to execute than the amount of time already spent
evaluating plans (the Ingres default).

But I think it's a little bit dangerous to turn this parameter. In normal situation many (near all in fact) query can run in a good performance range without modifying opf_timeout_factor or opf_joinop_timeout.

Some parameters and/or some informations are only accessible via traces points (see http://www.well.com/user/ideamen/trace.html, link given by http://www.naiua.org. TAKE CARE : traces points are not supported by CA but they often work). For example, have a look at trace point qe90

Example :
set trace point qe90;
your query

The output will be something like QEP (with nodes) and for each node, estimed time, dio, cpu and real time, dio and cpu.
An example of a node of the qe90 output :
----------
ORIG
at 1
et 100474

ad 1344
ed 306

ac 3529
ec 200948
-----------
where
at is actual tuples
et is etimated tuples
ad is actual dio
ed is estimated dio
ac is actual cpu time
ec is estimated cpu time

It could help you to determine if estimations from optimizer are good or not (in some case optimizer seems to be crazy). Then you can decide to modify parameters like opf_timeout_factor.

In cbf you can find, depend on your version - 2.6 in my case - and see cbf help, a variety of parameters that impact optimizer behavior :
opf_active_limit
opf_maxmemf
opf_memory
opf_old_cnf
opf_old_jcard
opf_timeout_factor

Perhaps it spend to much time because it ran out of opf memory. opf_old_* are parameters to switch to 6.4 qep algorithm.

Last tips for this reply : use REPEATED SELECT. The consequence is the QEP is stored and re used for other sessions that use the same query. Very usefull for "the" report your users launch so often a day ...

Optimization is a very big subject ;-)

Hope this help
JP

Ingres

All Answers


Answers by Expert:


Ask Experts

Volunteer


Jean-Pierre Zuate

Expertise

Any questions about database Ingres (owned by Computer Associates then Ingres Corp) : - Ingres 6.4 - OpenINGRES from 1.0 to 2.0 - IngresII from 2.0 to 2.6 - Ingres R3, Ingres 2006 (Open Source version) - Ingres 9.x, Ingres 10.x - All tool around Ingres : ABF, Report Writer, Replicator, OpenROAD (3.5 to 2006), Ingres/NET Ingres/STAR, ...

Experience

16 years of computing experience as :
- AS400 programmer
- AIX / Ingres administrator and developer (OpenROAD and korn shell)
- Ingres DataBase Administrator
- Ingres expert - Data modelisation - ETL - Reporting - Many of Computer Associates sofwares - ITIL / CMDB / Change Management

Organizations
http://lafageconseil.fr

Education/Credentials
Computing bachelor (1989)

Past/Present Clients
More than 100 customers during my 6 years of CA has an Ingres consultant, all around France and Africa (Togo, Cameroon, Morrocco, ...)

©2012 About.com, a part of The New York Times Company. All rights reserved.