AllExperts > Oracle 
Search      
Oracle
Volunteer
Answers to thousands of questions
 Home · More Oracle Questions · Answer Library  · Encyclopedia ·
More Oracle Answers
Question Library

Ask a question about Oracle
Volunteer
Experts of the Month
Expert Login

Awards

About Us
Tell friends
Link to Us
Disclaimer

 
 
 
 
About Edson
Expertise
I can answer questions about Sql (including Sql tuning), pl/Sql and a bit of oracle administration

Experience
I work as a consultant in Brazil and I face problems of all kind in oracle every day development.

Education/Credentials
INFNET

 
   

You are here:  Experts > Computing/Technology > Oracle > Oracle > Csv data into oracle

Oracle - Csv data into oracle


Expert: Edson - 10/22/2009

Question
I want to load data from csv file into oracle database

can u tell me step by step process for this

Answer
SQL Loader
SQL*Loader (sqlldr ) is the utility to use for high performance data loads.  The data can be loaded from any text file and inserted into the database.
During processing, SQL*Loader writes messages to the log file, bad rows to the bad file, and discarded rows to the discard file.
The Control File
The SQL*Loader control file contains information that describes how the data will be loaded.  It contains the table name, column datatypes, field delimiters, etc.  It simply provides the guts for all SQL*Loader processing.  

Manually creating control files is an error-prone process.  The following SQL script (controlfile.sql) can be used to generate an accurate control file for a given table.  The script accepts a table name and a date format (to be used for date columns), and generates a valid control file to use with SQL*Loader for that table.    

 controlfile.sql

set echo off ver off feed off pages 0
accept tname prompt 'Enter Name of Table: '
accept dformat prompt 'Enter Format to Use for Date Columns: '

spool &tname..ctl

select 'LOAD DATA'|| chr (10) ||
      'INFILE ''' || lower (table_name) || '.dat''' || chr (10) ||
      'INTO TABLE '|| table_name || chr (10)||
      'FIELDS TERMINATED BY '','''||chr (10)||
      'TRAILING NULLCOLS' || chr (10) || '('
from   user_tables
where  table_name = upper ('&tname');

select decode (rownum, 1, '   ', ' , ') ||
      rpad (column_name, 33, ' ')      ||
      decode (data_type,
          'VARCHAR2', 'CHAR NULLIF ('||column_name||'=BLANKS)',
          'FLOAT',    'DECIMAL EXTERNAL NULLIF('||column_name||'=BLANKS)',
          'NUMBER',   decode (data_precision, 0,
                      'INTEGER EXTERNAL NULLIF ('||column_name||
                      '=BLANKS)', decode (data_scale, 0,
                      'INTEGER EXTERNAL NULLIF ('||
                      column_name||'=BLANKS)',
                      'DECIMAL EXTERNAL NULLIF ('||
                      column_name||'=BLANKS)')),
          'DATE',     'DATE "&dformat" NULLIF ('||column_name||'=BLANKS)', null)
from   user_tab_columns
where  table_name = upper ('&tname')
order  by column_id;

select ')'  
from dual;
spool off

Once executed and given a table name and date format, controlfile.sql will generate a control file with the following contents:  

 controlfile.par

LOAD DATA
INFILE 'table_with_one_million_rows.dat'
INTO TABLE TABLE_WITH_ONE_MILLION_ROWS
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(
  COL1                     DECIMAL EXTERNAL NULLIF (COL1=BLANKS)
, COL2                     DECIMAL EXTERNAL NULLIF (COL2=BLANKS)
, COL3                     CHAR NULLIF (COL3=BLANKS)
, COL4                     CHAR NULLIF (COL4=BLANKS)
, COL5                     CHAR NULLIF (COL5=BLANKS)
, COL6                     DATE "MM-DD-YYYY" NULLIF (COL6=BLANKS)
)

The control file can also specify that records are in fixed format.  A file is in fixed record format when all records in a datafile are the same length.  The control file specifies the specific starting and ending byte location of each field.  This format is harder to create and less flexible but can yield performance benefits.  A control file specifying a fixed format for the same table could look like the following:

LOAD DATA
INFILE 'table_with_one_million_rows.dat'
INTO TABLE TABLE_WITH_ONE_MILLION_ROWS
(
  COL1  POSITION(1:4)     INTEGER EXTERNAL
, COL2  POSITION(6:9)     INTEGER EXTERNAL
, COL3  POSITION(11:46)   CHAR
, col4  position(48:83)   CHAR
, col5  position(85:120)  CHAR
, COL6  POSITION(122:130) DATE "MMDDYYYY"
)
The Log File
The log file contains information about the SQL*loader execution.   It should be viewed after each SQL*Loader job is complete.  Especially interesting is the summary information at the bottom of the log, including CPU time and elapsed time.  The data below is a sample of the contents of the log file.




*****************************************************************************************
SQL*Loader: Release 9.2.0.1.0 - Production on Mon Mar 10 23:39:04 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

*****************************************************************************************
Control File:   sqlload.par
Data File:      sqlload.txt
 Bad File:     sqlload.bad
 Discard File:  none specified

(Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation:    none specified
Path used:      Direct

Table TABLE_WITH_ONE_MILLION_ROWS, loaded from every logical record.
Insert option in effect for this table: INSERT
TRAILING NULLCOLS option in effect

Table TABLE_WITH_ONE_MILLION_ROWS, loaded from every logical record.
Insert option in effect for this table: INSERT
TRAILING NULLCOLS option in effect

  Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------
COL1                                FIRST     *   ,       CHARACTER
   NULL if COL1 = BLANKS
COL2                                 NEXT     *   ,       CHARACTER
   NULL if COL2 = BLANKS
COL3                                 NEXT     *   ,       CHARACTER
   NULL if COL3 = BLANKS
COL4                                 NEXT     *   ,       CHARACTER
   NULL if COL4 = BLANKS
COL5                                 NEXT     *   ,       CHARACTER
   NULL if COL5 = BLANKS
COL6                                 NEXT     *   ,       DATE MMDDYYYY
   NULL if COL6 = BLANKS


Table TABLE_WITH_ONE_MILLION_ROWS:
 1000000 Rows successfully loaded.
 0 Rows not loaded due to data errors.
 0 Rows not loaded because all WHEN clauses were failed.
 0 Rows not loaded because all fields were null.

 Date cache:
  Max Size:      1000
  Entries :         1
  Hits    :    999999
  Misses  :         0

Bind array size not used in direct path.
Column array  rows :    5000
Stream buffer bytes:  512000
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:       1000000
Total logical records rejected:         0
Total logical records discarded:        0
Total stream buffers loaded by SQL*Loader main thread:      255
Total stream buffers loaded by SQL*Loader load thread:      128

Run began on Mon Mar 10 23:39:04 2003
Run ended on Mon Mar 10 23:39:51 2003

Elapsed time was:     00:00:47.55
CPU time was:         00:00:12.85
SQL*Loader Options
SQL*Loader provides the following options, which can be specified either on the command line or within a parameter file:   

•     bad – A file that is created when at least one record from the input file is rejected.  The rejected data records are placed in this file.  A record could be rejected for many reasons, including a non-unique key or a required column being null.  
•     bindsize –  [256000] The size of the bind array in bytes.  
•     columnarrayrows – [5000] Specifies the number of rows to allocate for direct path column arrays.
•     control – The name of the control file.  This file specifies the format of the data to be loaded.  
•     data – The name of the file that contains the data to load.  
•     direct – [FALSE] Specifies whether or not to use a direct path load or conventional.   
•     discard – The name of the file that contains the discarded rows.  Discarded rows are those that fail the WHEN clause condition when selectively loading records.
•     discardmax – [ALL] The maximum number of discards to allow.
•     errors – [50] The number of errors to allow on the load.  
•     external_table – [NOT_USED] Determines whether or not any data will be loaded using external tables.  The other valid options include GENERATE_ONLY and EXECUTE.     
•     file – Used only with parallel loads, this parameter specifies the file to allocate extents from.
•     load – [ALL] The number of logical records to load.   
•     log – The name of the file used by SQL*Loader to log results.  
•     multithreading – The default is TRUE on multiple CPU systems and FALSE on single CPU systems.  
•     parfile – [Y] The name of the file that contains the parameter options for SQL*Loader.  
•     parallel – [FALSE] Specifies a filename that contains index creation statements.
•     readsize – The size of the buffer used by SQL*Loader when reading data from the input file.  This value should match that of bindsize.  
•     resumable – [N] Enables and disables resumable space allocation.  When “Y”, the parameters resumable_name and resumable_timeout are utilized.  
•     resumable_name – User defined string that helps identify a resumable statement that has been suspended.  This parameter is ignored unless resumable = Y.
•     resumable_timeout – [7200 seconds] The time period in which an error must be fixed.  This parameter is ignored unless resumable = Y.
•     rows – [64] The number of rows to load before a commit is issued (conventional path only).  For direct path loads, rows are the number of rows to read from the data file before saving the data in the datafiles.  
•     silent – Suppress errors during data load.  A value of ALL will suppress all load messages.  Other options include DISCARDS, ERRORS, FEEDBACK, HEADER, and PARTITIONS.
•     skip – [0] Allows the skipping of the specified number of logical records.  
•     skip_unusable_indexes – [FALSE] Determines whether SQL*Loader skips the building of indexes that are in an unusable state.
•     skip_index_maintenance – [FALSE] Stops index maintenance for direct path loads only.  
•     streamsize – [256000] Specifies the size of direct path streams in bytes.   
•     userid – The Oracle username and password.
To check which options are available in any release of SQL*Loader use this command:

sqlldr help=y
Maximizing SQL*Loader Performance  
SQL*Loader is flexible and offers many options that should be considered to maximize the speed of data loads.  These include:

1.   Use Direct Path Loads - The conventional path loader essentially loads the data by using standard insert statements.  The direct path loader (direct=true) loads directly into the Oracle data files and creates blocks in Oracle database block format.  The fact that SQL is not being issued makes the entire process much less taxing on the database.  There are certain cases, however, in which direct path loads cannot be used (clustered tables).  To prepare the database for direct path loads, the script $ORACLE_HOME/rdbms/admin/catldr.sql.sql must be executed.
2.   Disable Indexes and Constraints.  For conventional data loads only, the disabling of indexes and constraints can greatly enhance the performance of SQL*Loader.   
3.   Use a Larger Bind Array.  For conventional data loads only, larger bind arrays limit the number of calls to the database and increase performance.  The size of the bind array is specified using the bindsize parameter.  The bind array's size is equivalent to the number of rows it contains (rows=) times the maximum length of each row.
4.   Use ROWS=n to Commit Less Frequently.  For conventional data loads only, the rows parameter specifies the number of rows per commit.  Issuing fewer commits will enhance performance.  
5.   Use Parallel Loads.  Available with direct path data loads only, this option allows multiple SQL*Loader jobs to execute concurrently.
$ sqlldr control=first.ctl  parallel=true direct=true
$ sqlldr control=second.ctl parallel=true direct=true

6.   Use Fixed Width Data.  Fixed width data format saves Oracle some processing when parsing the data.  The savings can be tremendous, depending on the type of data and number of rows.  
7.   Disable Archiving During Load.  While this may not be feasible in certain environments, disabling database archiving can increase performance considerably.
8.   Use unrecoverable.  The unrecoverable option (unrecoverable load data) disables the writing of the data to the redo logs.  This option is available for direct path loads only.
Using the table table_with_one_million_rows, the following benchmark tests were performed with the various SQL*Loader options. The table was truncated after each test.  

SQL*Loader Option    Elapsed Time (Seconds)   Time Reduction
direct=false
rows=64   135   -
direct=false
bindsize=512000
rows=10000   92   32%
direct=false
bindsize=512000
rows=10000
database in noarchivelog mode   85   37%
direct=true   47   65%
direct=true
unrecoverable   41    70%
direct=true
unrecoverable
fixed width data   41   70%

Table 4.3 – Results indicate conventional path loads take longest.

The results above indicate that conventional path loads take the longest.  However, the bindsize and rows parameters can aid the performance under these loads.  The test involving the conventional load didn’t come close to the performance of the direct path load with the unrecoverable option specified.  

It is also worth noting that the fastest import time achieved for this table (earlier) was 67 seconds, compared to 41 for SQL*Loader direct path – a 39% reduction in execution time.  This proves that SQL*Loader can load the same data faster than import.

These tests did not compensate for indexes.  All database load operations will execute faster when indexes are disabled.  

For example:

sqlldr pan_leitura@pand control='D:\LOADER\control_PSMOEDA.ctl' data='d:\PSMOEDA00_20090805.CSV' ROWS=30000 direct = true
sqlldr pan_leitura@pand control='D:\LOADER\control_CCMOEDA.ctl' data='d:\CCMOEDA00_20090805.CSV' ROWS=30000 direct = true
sqlldr pan_leitura@pand control='D:\LOADER\control_ORMOEDA.ctl' data='d:\ORMOEDA00_20090805.CSV' ROWS=30000 direct = true

Add to this Answer   Ask a Question


 
User Agreement | Privacy Policy | Kids' Privacy Policy | Help
Copyright  © 2008 About, Inc. AllExperts, AllExperts.com, and About.com are registered trademarks of About, Inc. All rights reserved.