Bulk load your databases with… DataCleaner.
In many cases during software development projects there is a need to perform a benchmark of some applications. Recently, within my master thesis project I have been comparing performance of distributed processing frameworks and storage engines. In order to do that I needed to load databases with dummy data. The problem is my source data was stored in a CSV file and had millions of records…
Inserting such amounts of data in a traditional way (an SQL script with INSERT statements) takes a lot of time and you need to remember about several things to make it effective (disabling autocommit, using COPY or LOAD DATA commands instead of INSERTs etc.). Every DB has its own recommended way, not to mention that firstly such a script needs to be generated in some way out of the CSV file (programming involved).
A nice and generic solution for this problem would be using… DataCleaner. It was not obvious for me at a first glance as the name indicates – DataCleaner is a data quality and profiling tool and we just want to load our database in a quick and easy way… Thanks to using Apache MetaModel as a data access framework, DataCleaner can read and write many sources of data: all sorts of relational and NoSQL databases like MySQL, PostgreSQL or MongoDB, CSV files, XML files, Excel spreadsheets etc. Bulk insert optimizations for several databases have been included in MetaModel, so the import process should be as fast as possible.
Let’s take my case as an example. We have a CSV file “customer.csv” with dummy customers data:
id,given_name,family_name,full_address,email,phone 1,Tomasz,Guzialek,"Fakevej 1, København, Denmark",tomasz@guzialek.info,012345678 2,Thomas,Guzialek,"Fake Street 1, Poznan, Poland",tomaszguzialek@apache.org,+48987654321
and we want to insert the content to a PostgreSQL table “output” with the following schema (the table already exists):
CREATE TABLE output ( id character varying(255) bigint, given_name character varying(255), family_name character varying(255), full_address character varying(255), email character varying(255), phone character varying(255), CONSTRAINT customers_pkey PRIMARY KEY (id) )
Now it it time to register our data sources in DataCleaner. First, add the customers.csv file:
As you can see, the “schema” of the CSV file was automatically detected. Next, use the icon on the right side of the panel to register PostgreSQL database analogously. After these operations, both data sources should be visible on the list in the middle of the application window.
Make sure our customers file is selected and click Analyze. In the left pane choose the column by right-clicking them and “adding to source”.
This is the place where normally you design your data cleaning analysis job. Don’t get scared, though. We just want to save the input to a different data source, so our job is empty. Let’s just define where to save it. From menu “Write data” (circled on the previous screenshot), choose Insert Into Table and specify the details of the “output” table. The mapping from CSV file columns to DB table columns was again detected automatically as the names are matching.
Just click “Execute” and the content is being written to the database taking into account all the available optimizations.
Resources for learning Danish.
The increasing amount of duties – half-time work, internship together with my volleyball team practices forced me to quit my Danish course that I had been taking at the university. I regret I cannot continue, so I made a little research for tools that can support my self-reliant learning process. Of course these tools are also very useful while attending an organized course. Continue reading »
DIKU canteen – the Danish way…
The first thing that amazed me after commencement of my studies at University of Copenhagen was… the canteen of my department. The daily organization of this place varies significantly from the others…
Filed under Danish higher education, University of Copenhagen | Tags: canteen, common, diku, kantine, ku, ucph, university of copenhagen | Comments (2)Leveling the distance – remote desktop with TeamViewer.
Being more advanced than an average user frequently demands helping your family and friends with certain tasks regarding computer usage. It is not that bad when you are in the same room – just a minute and the goal is achieved. If not… Continue reading »
Filed under IT, Linux, Uncategorized, Useful software, Windows | Tags: computing, conference, desktop, meeting, remote, team, work | Comment (0)How to get free equipment in Denmark?
Most of you come to Denmark by plane – luggage volume is then limited and you take only the most necessary things. Tableware, drying racks, not to mention furniture is definitely beyond the scope. Fortunately, Denmark is a country where such equipment can be obtained with hands down, at no cost at all!
Filed under Copenhagen area, Denmark, Expat | Tags: big waste, equipment, free, free stuff copenhagen, genbrug, genbrugsstation, gratis, recycling | Comment (1)Danish bank account – Danske Bank vs Nordea.
When staying for a longer period in Denmark, Danish bank account is a necessity. There are two significant players at Danish market – Danske Bank and Nordea. The question – which one to choose as a foreigner is widely asked.
Filed under Copenhagen area, Denmark, Expat | Tags: account, dankort, danske bank, foreigner, mastercard, nordea, student, visa | Comment (0)Choosing Danish mobile phone provider.
EDIT 2013-10-23:
The Gratis50 in Fullrate (previously M1) product described in the article is not available anymore since 20th September 2013…
A good idea is to obtain a Danish prepaid SIM card at the day of arrival to reduce costs even if it is a temporary solution until finding more suitable one.
It is also a hard thing to recommend the one and only solution – consumers’ needs are very diverse. For instance, I am looking for a prepaid, not a monthly subscription. I don’t talk much, rather text more and don’t use mobile Internet at all (maybe it will change with time, but I am already too addicted to being online ;)).
Before I will present my solution, I would like to recommend a mobile prepaids comparison tool – TaleTidPriser. Competition at this market is very strict so current best offer changes regularly – check on your own right before purchase.
Also be aware that some of the prepaid offers require you to register it with your CPR number. Obtaining a SIM card from 7-eleven kiosk is rather the best idea for a first phone in Denmark.
One of the solutions demanding CPR number is Gratis50 from M1 operator (it is not listed in TaleTidPriser currently). It is rationed one starter for one CPR number, but thanks to that you are eligible for 50 minutes of talk and 50 smses per month completely for free. Above this amount, you just pay normal rate. The only catches are: paying with a minute accuracy (not a second) and optional services that seem to be a standard are paid separately. Namely, 9 DKK per month for CallerID, 9 DKK for an answering machine and 19 DKK for calling abroad and roaming. However, the offer still seems to be profitable.
Disclaimer: the article is NOT sponsored by M1 or any other company. I mentioned it because lots of foreigners are not aware of such offer.
Guide to transportation system in Capital Region of Denmark.
The first challenge after arriving to Copenhagen is handling the public transportation network. Well-organized, but very expensive… let me just tell you that for a price of a montly card for 5 zones, I could travel within Poznań for 9 months (with student discount) ;). Let me outline the system for you. Continue reading »
Filed under Copenhagen area, Denmark, Expat | Tags: bike, bus, clip, clipcard, copenhagen, klippekort, metro, perodiekort, public, S-tog, S-train, ticket, transport, transportation | Comment (0)Week of the year in Google Calendar.
“Academic year 2012 starts in week 36 and block 1 lasts until week 44, because in week 45 the examination period starts. There is also a break in week 42…”. In such way, the academic year organization was described by the university. Strange? Yes. In Poland we do not count weeks of the year but specify time spans, e.g. 3 Aug 2012 – 5 Nov 2012. It is so exotic for me that I still have not became accustomed to it…
Needed little help not to miss any deadlines and organize my timetable efficiently. As a result, my Google Calendar shows number of the week of the year every Monday. How to turn it on?
On the left side of the interface choose an arrow next to Other Calendars, then Browse Interesting Calendars. Switch to tab More and subscribe calendar “Week numbers”. From now, your GCal should help you to deal with Danish calendar handling.
Filed under Denmark, Expat, Uncategorized | Tags: calendar, denmark, google, week, year | Comment (0)MyDenmarkTV – visiting the country without leaving home.
I have just started my stay in Copenhagen, so I have still been absorbing the new culture. The results of this adventure I have been publishing within the pages of this web log. Now, I would like to share with you the resource that I used widely before my arrival and found it very useful.
MyDenmarkTV is a weekly show founded by expats sharing their experience gained during a few years spent in Denmark. It is designed to hand over practical information concerning every day life and familiarize newcomers with Danish customs.
At present, the show is not active, but the archive of 100 episodes is a sufficient and valuable material to explore. When considering immigration or the decision has been made – MyDenmarkTV is an obligatory lecture.
Filed under Copenhagen area, Denmark, Expat, Uncategorized | Tags: copenhagen, denmark, erasmus exchange, expat, student | Comments (2)