Yudong Li: March 2010

Monday, March 22, 2010

Warning in JDO lazy fetch with App Engine

If you consistently meets the warning "org.datanucleus.store.appengine.MetaDataValidator warn: Meta-data warning for ****: The datastore does not support joins and therefore cannot honor requests to place related objects in the default fetch group. The field will be fetched lazily on first access. You can modify this warning by setting the datanucleus.appengine.ignorableMetaDataBehavior property in your config. A value of NONE will silence the warning. A value of ERROR will turn the warning into an exception." like this, please add the following line into your jdoconfig.xml:

<property name="datanucleus.appengine.ignorableMetaDataBehavior" value="NONE" />

Although it is quite obvious, but it's the problem where you put it might confuse many people.

Wednesday, March 17, 2010

Using memcache in GAE/J

Actually memcache is not a brand new concept, it has been utilized in many large scale projects. The most famous memcache event will be the Whale in Twitter's front page. To understand the background knowledge of memcache, please Google it. In one sentence, memcache is a way to cache those frequently used data internal or external to reduce the cost of database query and remote invocation. This is a very classical way in dealing with all kinds of database, however since most of the codes are written in low level , most of the developers are out of touch. Here, with the help of memcache, to transplant this idea into the application level, which will definitely be a great boost to large scale applications.

Google App Engine supports memcache for a long time, which should be one of its born advantage. Here I'd like to briefly introduce how to use memcache in GAE/J. Sorry for those who are interested in Python.

First, let's construct a scenario. There is a university system in GAE/J which stores information for around 10k students. Such a system includes all kinds of information for students to use, such as enrollment, study blackboard, course selection and so on. As a result, there will be a huge demand on the database query. However, such kind of demand always falls in two parts, which part of the students who really like the system and would like to log in everyday, another part students who can be considered as "lazy" seldom care about this. As a result, to improve the efficiency of the system. to cache those frequently used information will be a good help. Here we assume there is a table called Student, no matter what kind of action to happen, there is always the need to query the data in Student table. Let's see how memcache to store the Student information.

First we generate a JDO POJO class to store Student information. As an example, the fields in the class are pretty simple.

[java]@PersistenceCapable(identityType = IdentityType.APPLICATION)

@Inheritance(customStrategy = “complete-table”)

public class Student implements Serializable{

@PrimaryKey

@Persistent

private String uuid;

@Persistent

private String name;

@Persistent

private String email;

@Persistent

private String address;

public Student(){

this.uuid = UUID.randomUUID().toString();

//setter&getter

}[/java]

Then we need to construct a Cache class which will in charge of the operations in Cache layer.

[java]

public class QueryCache {

private static final Logger log = Logger.getLogger(QueryCache.class

.getName());

private static QueryCache instance;

private Cache cache;

private QueryCache(){

try{

CacheFactory cacheFactory = CacheManager.getInstance().getCacheFactory();

cache = cacheFactory.createCache(Collections.emptyMap());

}catch(CacheException e){

log.severe(”Error in creating the cache”);

}

}

public static synchronized QueryCache getInstance(){

if(instance==null){

instance = new QueryCache();

}

return instance;

}

public void putInCache(String address, String student){

cache.put(address, student);

}

public String findInCache(String address){

if(cache.containsKey(address)){

return (String)cache.get(address);

}else{

return null;

}

}

}

[/java]

Inside this class, we generate a new Cache instance under the Singleton pattern. A map resides in this class. At the same time, we define two methods, one to put the student information into the cache and another to get the information out of the cache.

Finally, we construct a servlet to query the student information.

public class QueryServlet extends HttpServlet{
private static final Logger log = Logger.getLogger(QueryServlet.class.getName());
@Override
protected void doGet(HttpServletRequest req, HttpServletResposne resp) throws ServletException, IOException{
log.info(”Now start……”);
QueryCache cache = QueryCache.getInstance();
String studentC = cache.findInCache(”Address7694″);
if(studentC!=null){
resp.getWriter().write(”Found the item in cache!”);
}else{
resp.getWriter().write(”No hit in cache!”);
PersistenceManager pm = PMF.get().getPersistenceManager();
Query query = pm.newQuery(Student.class);
query.setFilter(”address==’Address7694′”);
List students = List query.execute();
if(students.iterator().hasNext()){
log.info(”Found one:”+student.toString());
resp.getWriter().write(”Found one:”+student.toString());
cache.putInCache(”Address7694″, student.toString());
}else{
log.info(”None found!”);
resp.getWriter().write(”None Found!”);
}
}
}

This is a very simple example to briefly show how to use memcache in GAE/J. However, a lot more things need to think about in reality such as where to use memcache, how to set the expire time of each cache, etc.

Sunday, March 14, 2010

Export a jar file in Eclipse project

If you would like to export your eclipse project into a jar file, maybe the first thing comes out of your mind is to build with Ant. But here I'd like to recommend a very handy tool called FatJar, which is a very helpful plug-in for eclipse to package your project.

Really simple, just install the plug-in from http://kurucz-grafika.de/fatjar. Then export as a fat jar will generate the jar file you need.

More details can be found in its tutorial, just Google it.

Tuesday, March 9, 2010

A piece of Java code to split large file

Working with Azure recently, sometimes when you are trying to upload large files into the Azure storage service, you cannot simply push it. For files larger than 64MB, you have to split the file into small trunks, and upload each of them as a blob list. It's not hard to understand the concept however split large files may not seem to be easy for junior developers.

Here is a simple code to split large files into small pieces which may be helpful if tbis is what you want to achieve:

class FileSplit {

private File f;

private FileInputStream fis;

private String path;

private String fileName;

int count;

public FileSplit(File f) {

this.f = f;

fileName = f.getName();

count = 0;

path = f.getParent();

}

public int split() {

try {

log.info("Start to split files");

fis = new FileInputStream(f);

byte buf[] = new byte[4 * 1000 * 1000];

int num = 0;

while ((num = fis.read(buf)) != -1) {

if (createSplitFile(buf, 0, num) == -1) {

return 0;

}

count++;

log.info("Finished one piece");

}

log.info("All finished");

} catch (Exception e) {

log.severe(e.getMessage());

} finally {

if (fis != null) {

try {

fis.close();

} catch (Exception e) {

log.severe(e.getMessage());

}

}

}

return count;

}

private int createSplitFile(byte buf[], int zero, int num) {

FileOutputStream fosTemp = null;

try {

fosTemp = new FileOutputStream(path + "/" + count + ".tmppt");

fosTemp.write(buf, zero, num);

fosTemp.flush();

} catch (Exception e) {

return -1;

} finally {

try {

fosTemp.close();

} catch (Exception e) {

log.severe(e.getMessage());

}

}

return 1;

}

}

Monday, March 8, 2010

Clarify the differences between Amazon S3 and Amazon CloudFront

While cloud computing is the buzzword around world, Amazon is no doubt one of the most important competitors in this field. The products S3 and CloudFront both play their vital role for cloud storages and services.

However, there is always some misunderstandings between these two services. Here is a simple clarification which may be a little bit helpful.

Amazon S3 is a storage service, which means it is solely used to store the data in some cloud out there, and you have no choice to replicate this data in any other places. The Amazon S3 will provide you a URL which points to this specific piece of data, and each time you will be routed to the same IP to acquire this data.

Amazon CloudFront is a CDN service, which replicates all your data into different locations around the world. When you are in NY, the nearest node to download your data will be in US not in Europe; when you are in London, CloudFront will route you to Dublin which is far less distance than in US. This effectively shortened the length for data transmission.

As above we can see that most of the time, you will be under the assumption that you have to use both of these two services at the same time to improve your data's efficiency.

Sunday, March 7, 2010

Using Google Code as a SVN repository

Although Google is branded as a searching company, now it has come into every corner of IT world. I just realized that without Google, I will be kind of in a situation that cannot live comfortable any more. Using Gmail to send and receive mails, using Google Docs to create, edit and share documents, using Google Calendar to arrange weekly and daily schedule, using Google Reader to read the latest information and news around, using Google Map to discover and explore neighborhood and destinations, using Google Buzz and Wave to socialize, and more importantly for me, using Google App Engine to earn money.

Recently, I have a thought to take participant in the open-source world, as a result the first thing came into my mind is to create some small project on the Google Code, which is previously dominated by SourceForge. After some attempting, I have constructed a very small project with the name of "restfulhttpclient". It is written in Java, with the IDE of Netbeans 6.8. Consequently, the checking out code will in the structure in Netbeans.

It is not difficult to adopt Google Code as SVN repository, since it provides the most basic functions to host code, and once you have an account and create a project there, you just need simply using your familiar SVN client to commit and check out your code.

However, I do have some questions here, the one that confused me most is that why should I name my project in all lowercase characters. Some one may argue it will be easier for lowercase letters to display in the address URI and for people to type in, but I believe a simple mapping and checking mechanism will not be that hard.

Another thing is actually for each of the project, there are two different URIs mapping to this project. One in the format of ***.googlecode.com, and the other is code.google.com/p/***. Whenever you try to upload your file, it is better to choose the first one, since the latter one will give you a 400 bad request response.

All right, if you have some interest, try to download my project in a while ( not finished yet).

http://code.google.com/p/restfulhttpclient

The reason why it is called restful http client is simply because of I have to use those basic functions as work. As developing a project full of Restful web services, invoking a http request will be the most common task. A handy GUI tool will be really nice if it can covers those most usual functions. Even a lot of similar products are out there, I cannot find one that most suits me. Either some of them have too many capabilities that made them extremely difficult to master, or some just ignores certain part of the function that I have to use. In this project, users can send GET, POST, PUT and DELETE request to server, you can add specific headers, and basic authentication informations. The response code and message will also be displayed on the panel once received by the client.

Yudong Li