Bulk load your databases with… DataCleaner.

July 17th, 2014

In many cases during software development projects there is a need to perform a benchmark of some applications. Recently, within my master thesis project I have been comparing performance of distributed processing frameworks and storage engines. In order to do that I needed to load databases with dummy data. The problem is my source data was stored in a CSV file and had millions of records…

Inserting such amounts of data in a traditional way (an SQL script with INSERT statements) takes a lot of time and you need to remember about several things to make it effective (disabling autocommit, using COPY or LOAD DATA commands instead of INSERTs etc.). Every DB has its own recommended way, not to mention that firstly such a script needs to be generated in some way out of the CSV file (programming involved).

A nice and generic solution for this problem would be using… DataCleaner. It was not obvious for me at a first glance as the name indicates – DataCleaner is a data quality and profiling tool and we just want to load our database in a quick and easy way… Thanks to using Apache MetaModel as a data access framework, DataCleaner can read and write many sources of data: all sorts of relational and NoSQL databases like MySQL, PostgreSQL or MongoDB, CSV files, XML files, Excel spreadsheets etc. Bulk insert optimizations for several databases have been included in MetaModel, so the import process should be as fast as possible.

Let’s take my case as an example. We have a CSV file “customer.csv” with dummy customers data:

id,given_name,family_name,full_address,email,phone
1,Tomasz,Guzialek,"Fakevej 1, København, Denmark",tomasz@guzialek.info,012345678
2,Thomas,Guzialek,"Fake Street 1, Poznan, Poland",tomaszguzialek@apache.org,+48987654321

and we want to insert the content to a PostgreSQL table “output” with the following schema (the table already exists):

CREATE TABLE output
(
 id character varying(255) bigint,
 given_name character varying(255),
 family_name character varying(255),
 full_address character varying(255),
 email character varying(255),
 phone character varying(255),
 CONSTRAINT customers_pkey PRIMARY KEY (id)
)

Now it it time to register our data sources in DataCleaner. First, add the customers.csv file:

1a

2 As you can see, the “schema” of the CSV file was automatically detected. Next, use the icon on the right side of the panel to register PostgreSQL database analogously. After these operations, both data sources should be visible on the list in the middle of the application window.

Make sure our customers file is selected and click Analyze. In the left pane choose the column by right-clicking them and “adding to source”.

3a This is the place where normally you design your data cleaning analysis job. Don’t get scared, though. We just want to save the input to a different data source, so our job is empty. Let’s just define where to save it. From menu “Write data” (circled on the previous screenshot), choose Insert Into Table and specify the details of the “output” table. The mapping from CSV file columns to DB table columns was again detected automatically as the names are matching.4Just click “Execute” and the content is being written to the database taking into account all the available optimizations.

Resources for learning Danish.

August 23rd, 2013

The increasing amount of duties – half-time work, internship together with my volleyball team practices forced me to quit my Danish course that I had been taking at the university. I regret I cannot continue, so I made a little research for tools that can support my self-reliant learning process. Of course these tools are also very useful while attending an organized course. Continue reading »

DIKU canteen – the Danish way…

April 10th, 2013

The first thing that amazed me after commencement of my studies at University of Copenhagen was… the canteen of my department. The daily organization of this place varies significantly from the others…

Continue reading »

Leveling the distance – remote desktop with TeamViewer.

February 3rd, 2013

Being more advanced than an average user frequently demands helping your family and friends with certain tasks regarding computer usage. It is not that bad when you are in the same room – just a minute and the goal is achieved. If not… Continue reading »

How to get free equipment in Denmark?

January 14th, 2013

Most of you come to Denmark by plane – luggage volume is then limited and you take only the most necessary things. Tableware, drying racks, not to mention furniture is definitely beyond the scope. Fortunately, Denmark is a country where such equipment can be obtained with hands down, at no cost at all!

Danish bank account – Danske Bank vs Nordea.

January 1st, 2013

When staying for a longer period in Denmark, Danish bank account is a necessity. There are two significant players at Danish market – Danske Bank and Nordea. The question – which one to choose as a foreigner is widely asked.

Continue reading »

Choosing Danish mobile phone provider.

December 24th, 2012

EDIT 2013-10-23:
The Gratis50 in Fullrate (previously M1) product described in the article is not available anymore since 20th September 2013…

A good idea is to obtain a Danish prepaid SIM card at the day of arrival to reduce costs even if it is a temporary solution until finding more suitable one.

It is also a hard thing to recommend the one and only solution – consumers’ needs are very diverse. For instance, I am looking for a prepaid, not a monthly subscription. I don’t talk much, rather text more and don’t use mobile Internet at all (maybe it will change with time, but I am already too addicted to being online ;)).

Before I will present my solution, I would like to recommend a mobile prepaids comparison tool – TaleTidPriser. Competition at this market is very strict so current best offer changes regularly – check on your own right before purchase.

Also be aware that some of the prepaid offers require you to register it with your CPR number. Obtaining a SIM card from 7-eleven kiosk is rather the best idea for a first phone in Denmark.

One of the solutions demanding CPR number is Gratis50 from M1 operator (it is not listed in TaleTidPriser currently). It is rationed one starter for one CPR number, but thanks to that you are eligible for 50 minutes of talk and 50 smses per month completely for free. Above this amount, you just pay normal rate. The only catches are: paying with a minute accuracy (not a second) and optional services that seem to be a standard are paid separately. Namely, 9 DKK per month for CallerID, 9 DKK for an answering machine and 19 DKK for calling abroad and roaming. However, the offer still seems to be profitable.

Disclaimer: the article is NOT sponsored by M1 or any other company. I mentioned it because lots of foreigners are not aware of such offer.

Guide to transportation system in Capital Region of Denmark.

October 15th, 2012

The first challenge after arriving to Copenhagen is handling the public transportation network. Well-organized, but very expensive… let me just tell you that for a price of a montly card for 5 zones, I could travel within Poznań for 9 months (with student discount)  ;). Let me outline the system for you. Continue reading »

Week of the year in Google Calendar.

October 12th, 2012

“Academic year 2012 starts in week 36 and block 1 lasts until week 44, because in week 45 the examination period starts. There is also a break in week 42…”. In such way, the academic year organization was described by the university. Strange? Yes. In Poland we do not count weeks of the year but specify time spans, e.g. 3 Aug 2012 – 5 Nov 2012. It is so exotic for me that I still have not became accustomed to it…

Needed little help not to miss any deadlines and organize my timetable efficiently. As a result, my Google Calendar shows number of the week of the year every Monday. How to turn it on?

On the left side of the interface choose an arrow next to Other Calendars, then Browse Interesting Calendars. Switch to tab More and subscribe calendar “Week numbers”. From now, your GCal should help you to deal with Danish calendar handling.

MyDenmarkTV – visiting the country without leaving home.

October 2nd, 2012

I have just started my stay in Copenhagen, so I have still been absorbing the new culture. The results of this adventure I have been publishing within the pages of this web log. Now, I would like to share with you the resource that I used widely before my arrival and found it very useful.

MyDenmarkTV is a weekly show founded by expats sharing their experience gained during a few years spent in Denmark. It is designed to hand over practical information concerning every day life and familiarize newcomers with Danish customs.

At present, the show is not active, but the archive of 100 episodes is a sufficient and valuable material to explore. When considering immigration or the decision has been made – MyDenmarkTV is an obligatory lecture.

  • Software Engineering student coming from Poland, currently living in Copenhagen. Besides, semi-professional volleyball player.

  • View Tomasz Guziałek's profile on LinkedIn