Data and Technology. Recently I needed to load a single table from a transaction system with few hundred million rows into Greenplum/Postgresql from My. SQL. My. SQL schema didn't have many tables but one single table was large with around 5. G size including data and index. History. Visual FoxPro originated as a member of the class of languages commonly referred to as 'xBase' languages, which have syntax based on the dBase programming. Factorial Calculator: Need to take the number from a textbox and get the factorial. Factorial program using while loop: i want to better undestand whats going on with the code. Want to watch this again later? Sign in to add this video to a playlist. There are some mistakes and Lag out there.Please feel peace on me.And Again Sorry. I have created a application that takes an Excel file and inserts it into my access database table. I used Microsoft Access 15.0 Object Library in my computer. But. Ended up testing with 2 different techniques below. Technique 1: Using mysqldump and Postgresql inserts. In the beginning I thought it would be pretty straight forward with mysqldump I could be able to use postgres load utility. HOST - U USER - f FILENAME. Another minor quirk was transaction systems was using Clustrix a specific vendor version of My. SQL. It's dump creates a file that is not fully compatible with direct load into postgresql. Dump even with - -compitable=postgresql option didn't help much. One of the major issue while loading huge file with psql utility the "Out of memory" error even with reasonably small file, say 1. G. ERROR: out of memory. DETAIL: Cannot enlarge string buffer containing 0 bytes by 1. As a first step I removed all My. SQL comments and anything other than data with INSERT INTO statement. Example lines removed are below.- - My. SQL dump 1. 0. 1. Distrib 5. 1. 4. 2, for krobix- linux- gnu (x. Host: localhost Database: supply_production. Server version. 5. SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */. And retained any lines between. INSERT INTO line and ENABLE KEYS line. Used a script to perform the filtering. This gave me all the data I needed with only few hundred lines with each line as long as 1. MB! These are long lines with thousands and thousands of records. At certain intervals, 1. Clustrix inserted new row with "INSERT INTO ..". I removed these extra inserts comands and split the records with perl simpel one liner> perl - pi - e 's#\)\,\(#\)\,\n\(#g'. With continued error of "Out of memory" you will be kind of misled to believe that the Greenplum is slurping in all data into memory and trying to load which in first place shouldn't be the case. With INSERT INTO .. VALUES ( .. ) statement there is no need to do so. Next option was to find the possible error by splitting the file into smaller files and adding INSERT INTO statement at the beginning of each line and then removing the trailing "," at the end of last line. After trying 1. 0 million, 1 million and 0. Greenplum started throwing appropriate error like non- existing table (this is because the path was not set for postgresql), missing "," etc. Split command used> split - -lines=5. FILENAME. Adding "INSERT INTO .." to each of these files and instead of seeking to end of file and removing extra ",", I added a new dummy line which I can delete later from uploaded table.> for fn in `ls x*`. Working on $fn". echo "INSERT INTO schema. VALUES " > "${fn}_r_l". N)" > > "${fn}_r_l" . This created for each split file corresponding file with "_r_l" suffix (ready_to_load). Then loaded the table> for fn in `ls xd*_r_l`. Loading $fn". psql - h HOST - U USER - d DATABASE - f "FILENAME". Systems and utilities used. Greenplum DB - Greenplum Database 4. Postgresql - Postgre. SQL 8. 2. 1. 4. My. SQL - 5. 0. 4. 5- clustrix- v. Perl - 5. 8. 8 multithreaded. All running on linux x. G memory. There were more than 4. G data loaded in less than three hours. Still substantial but it is one time load and was acceptable. Technique 2: Using mysqldump and Greenplum gpload. Greenplum's bulk loading utility (gpload) is an excellent one to load large data set. After dumping the data and cleaning, formatting it into a few files of 1. G each, you can use gpload as below. For example in the below table replace all place holders with respective values. With dynamically created control file (and no hard- coded values) the technique can be used for daily bulk loads as well. VERSION: 1. 0. 0. DATABASE: USER: HOST: PORT: GPLOAD: INPUT: - SOURCE: LOCAL_HOSTNAME: - PORT: FILE: - - FORMAT: text- DELIMITER: '|'- NULL_AS: 'NULL'- ERROR_LIMIT: 2. ERROR_TABLE: sandbox_log. COLUMNS: - timestamp: text- priority: text... PRELOAD: - TRUNCATE: false. OUTPUT: - TABLE: - MODE: INSERTThis is a much faster and efficient loading than technique 1.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
October 2016
Categories |