Monday, May 3, 2010

Using GAE python to bulk load CSV data into Java datastore

The official document is here: http://code.google.com/appengine/docs/python/tools/uploadingdata.html, more details will be covered here with Java applications.

1. Using any Windows environment to download Python SDK 2.5.X, preferably 2.5.4 since it is the last stable version with Windows Installer. Avoid to download 2.6.X and 3.X.X because GAE doesn't officially support these.

2. Download Google App Engine SDK for Python. Current version is 1.3.3. You may download GAE launcher which is only available in Windows.

3. Create a new project, naming it uploaddata (or whatever you like), add an app.yaml file

application: XXX
version: 1
runtime: python
api_version: 1

handlers:
-url: /remote_api
script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
login: admin

Add above code to the app.yaml file. Use the correct application name and version, do not change the script of the handler.

4. Generate a python class, which can mapping the datastore table into a class. An example is:

(Student.py)
from google.appengine.ext import db

class Student(db.Model):
studentId = db.StringProperty()
name = db.StringProperty()
address = db.StringProperty()
......

5. Create a data loader file used by the handler. Here is another example:

(loader.py)
import datetime
from google.appengine.ext import db
from google.appengine.tools import bulkloader
import Student

class StudentLoader(bulkloader.Loader):
def __init__(self):
bulkloader.Loader.__init__(self, 'Student',
[('studentId', str), ('name', str), ('address', str)])
loaders = [StudentLoader]

Pay attention to those columns may contain characters in French Accent or Asian languages, use proper unicode to convert.

4. With command line, using the following command to upload data. With previous example, a sample command would look like:

appcfg.py upload_data --config_file=loader.py --filename=data.csv --kind=Student uploaddata

No comments:

Post a Comment