Archive for the ‘ Google App Engine ’ Category


Cache in AppEngine

To correctly set the cache for Google App Engine looks like an easy task, but in order to fully understand the story behind the scene turns out to be a little bit tricky as there are some hidden points that only can be discovered from various posts that scattered in the cloud.

Some highlighted points here which I have learned from the research:

1) For static files, the cache is being set in the appengine-web.xml file, you can specify how long it is going to expire within the file.
The cache here is if you don’t declare your static file here, then quite possibly you will not be able to render your static file well. Google use a different source to serve static files, as a result you have to explicitly tells AppEngine that “these are the types of static files I am using, and you need to serve it”. An alternative is to add those static files into your web.xml as a servlet or a filter, which will then be caught by the servlet/filter chain, however, this is not the recommended way to go.

2) You can use a filter to add “cache-control” header to static files request/response, but Google will not honor it. To be precisely, Google will always return a private cache-control for your static files regardless what kind of setting you provided. Hence these static files will only available in your private cache pool (local browser), and will expire after your specified length.

3) However, we can use above mentioned method to add “cache-control” header to dynamic generated pages, such as jsp file. This is particularly useful for files that rarely change, such as front-page or some introduction pages. And this time, Google will honor your setting in your cache-control header with the right expire time.

4) The way current browser working is a little bit weird but understandable:
If you click a link on a page, then browser will always first check if there is a local cache available, if not then goes to the server to fetch.
If you press F5/Refresh the page, the the browser will just ignore your local browser cache at all, but goes directly to the server. This time, it will use the “is-modified-since” header to compare whether the content on the server side is still valid, if it is, then a 304 will return and you will still use your local cache, otherwise a new version will be retrieved.

In short, if you click the link, then it is possible that client will not talk to the server at all, but if you refresh, then the client will always talk to the server.

5) Both Chrome and Firefox have good tools for debugging HTTP Request, see how the cache is working and diagnose each of the specific request.

Session Clean in Google App Engine

It is always sad that with such a good product, the documents always lacking behind, sometimes even years. While a lot of information can be retrieved by using its mother Google, still a formal doc and some practical hint will definitely be helpful than those articles scattered in the sea.

Anyway, to clean the session in the AppEngine database is a basic requirement for any real product running on it. If you don’t clean it, unless you are not using session anywhere, there is definitely one day your datastore will blow out.

Once you have a session hanging around, you will easily detect them under the Datastore Viewer. It is actually a table called _ah_session. (Yep, whenever you see _ah, it means Google is playing some magic background stuff). It will list all current sessions in the app and its expire date as well as the size of it (well, actually it’s the value, but most of the time, the value is not easy to be displaying there)

Apparently, the most direct way and easiest way is using your hand, tick all those boxes before the sessions, and click the DELETE button, and then yep, all sessions just cleared. Easy right, what if you have 100 sessions? Some one may argue that, you can append &limit=SOME_MAGIC_MAXIMUM_NUMBER_TO_BE_DISPLAYED at the end of the link in the browser, refresh the page, and then clear them all at once. But as far as I know, the maximum number you can set is 200, which means if there are 1000 entries, then you have to go through all 5 pages.

This manual process sounds boring and lack of the self satisfactory as a software developer, don’t you think? As a programmer, we always look down those guys who has to use mouse and keyboard instead of using their mind to delete stuff. All right, so Google does provides a way to clean the session, it is a servlet, part of the Google Code infrastructure (Being aware of Google’s BigTable). Basically what you need to do is declare the servlet: SessionCleanupServlet in your web.xml, and get a cron job to call it periodically based on your traffic.

This article explains well how to integrate the servlet and cron job into your app to delete the sessions. But you should be aware that each of the servlet run can only remove 100 sessions at most, that means if you have a huge traffic and huge sessions, you have to set the servlet/cron job runs more frequently but also make sure you won’t interrupt any current user’s activities.

OK, it seems all working well except some bugs may happen. One of the issues we have detect is that we have some very very very old sessions being around, maybe one year (Before we include the clean session mechanism), and obviously no matter how many times of the session being cleaned by the servlet, those sessions will always survive from all bullets. While, we have no explaination about it and they are always so trivial to be neglect until a 500 error comes up when a new version being released.

One bug we recently have is after we change one of the property in a class type from String to Date, and accordingly some of the related methods are being updated as well. But those ghost sessions still have String in their hearts, and when app initialized, everything just crashed because of this catch. Consequently, we are thinking about maybe it is worthwhile to clean the cache whenever a new version being released, and this cannot being relied on the servlet provided by Google, but rather our own code to delete all of them. This thread elegantly handles above problem by deleting all sessions and caches using a customized code, and you can now have a clean environment whenever you want.

So once you have integrated above two points into your projects, you should no longer worry about any session issues, and hopefully Google will someday bring us a better built-in solution to handle all this issues. For now, let’s just work harder.

Google App Engine: Found a jar file too large to upload

When deploying your application into the server, sometimes you may encounter this issue that if you have a really big JAR. This is not a common case before the release of 1.5.4, however, the latest appengine-tools-sdk-1.5.4.jar is about 9.7MB which exceeds the limit a little bit. I’m not sure if you use Eclipse’s plugin to upload your application will be OK or not, but definitely when you try to use the maven plugin, an error will happen to notify you that the jar is too large to upload.

The fix for using standard upload is simple that add the provided parameter to the end of the command is fine.

For maven part, what you need to modify is your pom.xml:

<plugin>
<groupId>net.kindleit</groupId>
<artifactId>maven-gae-plugin</artifactId>
<version>${maven.gae.plugin.version}</version>
<configuration>
<splitJars>true</splitJars>
</configuration>
<dependencies>
<dependency>
<groupId>net.kindleit</groupId>
<artifactId>gae-runtime</artifactId>
<version>${gae.version}</version>
<type>pom</type>
</dependency>
</dependencies>
</plugin>

The one with the red color is the thing that you need to add, are maven plugin will use the split jar pattern to upload the application.

Long IP Address

Most of the time, people will think of an IP adress as a String. Especially in Java, most of the time, developers will deal with IP address either with URL or String class. However, representing an IP adress with String has several disadvantages. First a String usually takes more memory comparing to the “same” value int or long, and it will be difficult to compare with other IP addresses with a String. And more importantly, it will not be possible for people to easily determine whether an IP address is in the range of another two IP addresses.

Since IP addresses (IPv4) are composed by four integers ranging from 0 - 255, it will be obviously easy to convert an String IP address to an numerical form which can also uniquely represent the IP address. That’s how Long IP address emerges.

A simple method to convert the String IP address to a Long IP address (A.B.C.D) would be:

256*256*256*A + 256*256*B + 256*C + D

Using Long IP address will be helpful in certain scenarios, one is when you dealing with IP-Location mapping in Google App Engine. A very popular data called GeoIP created by MaxMind is heavily used in a lot of different projects. However, when parsing the IP addresses, what they have done in the Java library is first transform the IP String into an InetAddress, and then using getAddress() to get its byte[], and finally get the Long value. There will be no problem when you using this library in other platforms. But in Google App Engine, things will got stuck because of the InetAddress is on the black-list of GAE, which means you will not be able to play with this class. The workaround here would be using the converting method above, you can directly get the Long value which is what they have calculated all the way along.

There might be some other places Long IP addresses is useful, especially when dealing with range query of IP addresses.

Using GAE python to bulk load CSV data into Java datastore

The official document is here: http://code.google.com/appengine/docs/python/tools/uploadingdata.html, more details will be covered here with Java applications.

1. Using any Windows environment to download Python SDK 2.5.X, preferably 2.5.4 since it is the last stable version with Windows Installer. Avoid to download 2.6.X and 3.X.X because GAE doesn’t officially support these.

2. Download Google App Engine SDK for Python. Current version is 1.3.3. You may download GAE launcher which is only available in Windows.

3. Create a new project, naming it uploaddata (or whatever you like), add an app.yaml file

application: XXX
version: 1
runtime: python
api_version: 1

handlers:
-url: /remote_api
script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
login: admin

Add above code to the app.yaml file. Use the correct application name and version, do not change the script of the handler.

4. Generate a python class, which can mapping the datastore table into a class. An example is:

(Student.py)
from google.appengine.ext import db

class Student(db.Model):
studentId = db.StringProperty()
name = db.StringProperty()
address = db.StringProperty()
……

5. Create a data loader file used by the handler. Here is another example:

(loader.py)
import datetime
from google.appengine.ext import db
from google.appengine.tools import bulkloader
import Student

class StudentLoader(bulkloader.Loader):
def __init__(self):
bulkloader.Loader.__init__(self, ‘Student’,
[('studentId', str), ('name', str), ('address', str)])
loaders = [StudentLoader]

Pay attention to those columns may contain characters in French Accent or Asian languages, use proper unicode to convert.

4. With command line, using the following command to upload data. With previous example, a sample command would look like:

appcfg.py upload_data –config_file=loader.py –filename=data.csv –kind=Student uploaddata

App Engine SDK 1.3.3 released

Although the Eclipse plug-in has not updated yet, I have tried to download the zip file. This is only a minor updated version with limited features. One of the most important thing worth to pay attention is the SQLite support for Python. Despite of the official word “Note that this feature does not add SQL support to the App Engine SDK or service”, it is a great advantage for developers to harness the benefits of SQLite. Just wondering when will they expand the support to Java field. It seems Java is always a step backward compared to the development team of Python.

New Warning Message on Google App Engine

I just noticed a pretty new warning appears in the log of App Engine:

“This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.”

Previously, there is no such thing highlighted the reason why sometimes a simple operation might take longer CPU hours than expected. Now with this handy warning, things are much clear and you don’t have to worry about any internal coding issues with your project.

However, if Google can solve this issue, it will be much better. Currently, in order to avoid such re-loading time waste, I have to reduce my cron job frequency to once per minute. Yep, when it runs in 2/min, App Engine will complain every time, and the overall CPU cost is even higher than the more frequently job. Ironically, Ridiculously, huh?

I know some guys are working this issue now, hopefully it can be fixed in the near future.

Warning in JDO lazy fetch with App Engine

If you consistently meets the warning “org.datanucleus.store.appengine.MetaDataValidator warn: Meta-data warning for ****: The datastore does not support joins and therefore cannot honor requests to place related objects in the default fetch group. The field will be fetched lazily on first access. You can modify this warning by setting the datanucleus.appengine.ignorableMetaDataBehavior property in your config. A value of NONE will silence the warning. A value of ERROR will turn the warning into an exception.” like this, please add the following line into your jdoconfig.xml:

<property name=”datanucleus.appengine.ignorableMetaDataBehavior” value=”NONE” />

Although it is quite obvious, but it’s the problem where you put it might confuse many people.

Using memcache in GAE/J

Actually memcache is not a brand new concept, it has been utilized in many large scale projects. The most famous memcache event will be the Whale in Twitter’s front page. To understand the background knowledge of memcache, please Google it. In one sentence, memcache is a way to cache those frequently used data internal or external to reduce the cost of database query and remote invocation. This is a very classical way in dealing with all kinds of database, however since most of the codes are written in low level , most of the developers are out of touch. Here, with the help of memcache, to transplant this idea into the application level, which will definitely be a great boost to large scale applications.

Google App Engine supports memcache for a long time, which should be one of its born advantage. Here I’d like to briefly introduce how to use memcache in GAE/J. Sorry for those who are interested in Python.

First, let’s construct a scenario. There is a university system in GAE/J which stores information for around 10k students. Such a system includes all kinds of information for students to use, such as enrollment, study blackboard, course selection and so on. As a result, there will be a huge demand on the database query. However, such kind of demand always falls in two parts, which part of the students who really like the system and would like to log in everyday, another part students who can be considered as “lazy” seldom care about this. As a result, to improve the efficiency of the system. to cache those frequently used information will be a good help. Here we assume there is a table called Student, no matter what kind of action to happen, there is always the need to query the data in Student table. Let’s see how memcache to store the Student information.

First we generate a JDO POJO class to store Student information. As an example, the fields in the class are pretty simple.

[java]@PersistenceCapable(identityType = IdentityType.APPLICATION)

@Inheritance(customStrategy = “complete-table”)

public class Student implements Serializable{

@PrimaryKey

@Persistent

private String uuid;

@Persistent

private String name;

@Persistent

private String email;

@Persistent

private String address;

public Student(){

this.uuid = UUID.randomUUID().toString();

//setter&getter

}[/java]

Then we need to construct a Cache class which will in charge of the operations in Cache layer.

[java]

public class QueryCache {

private static final Logger log = Logger.getLogger(QueryCache.class

.getName());

private static QueryCache instance;

private Cache cache;

private QueryCache(){

try{

CacheFactory cacheFactory = CacheManager.getInstance().getCacheFactory();

cache = cacheFactory.createCache(Collections.emptyMap());

}catch(CacheException e){

log.severe(”Error in creating the cache”);

}

}

public static synchronized QueryCache getInstance(){

if(instance==null){

instance = new QueryCache();

}

return instance;

}

public void putInCache(String address, String student){

cache.put(address, student);

}

public String findInCache(String address){

if(cache.containsKey(address)){

return (String)cache.get(address);

}else{

return null;

}

}

}

[/java]

Inside this class, we generate a new Cache instance under the Singleton pattern. A map resides in this class. At the same time, we define two methods, one to put the student information into the cache and another to get the information out of the cache.

Finally, we construct a servlet to query the student information.

public class QueryServlet extends HttpServlet{
private static final Logger log = Logger.getLogger(QueryServlet.class.getName());
@Override
protected void doGet(HttpServletRequest req, HttpServletResposne resp) throws ServletException, IOException{
log.info(”Now start……”);
QueryCache cache = QueryCache.getInstance();
String studentC = cache.findInCache(”Address7694″);
if(studentC!=null){
resp.getWriter().write(”Found the item in cache!”);
}else{
resp.getWriter().write(”No hit in cache!”);
PersistenceManager pm = PMF.get().getPersistenceManager();
Query query = pm.newQuery(Student.class);
query.setFilter(”address==’Address7694′”);
List students = List query.execute();
if(students.iterator().hasNext()){
log.info(”Found one:”+student.toString());
resp.getWriter().write(”Found one:”+student.toString());
cache.putInCache(”Address7694″, student.toString());
}else{
log.info(”None found!”);
resp.getWriter().write(”None Found!”);
}
}
}

This is a very simple example to briefly show how to use memcache in GAE/J. However, a lot more things need to think about in reality such as where to use memcache, how to set the expire time of each cache, etc.

How to invoke a remote web service in GAE

Well, with GAE it is easy to develop applications on the cloud. But various limitations sometimes make people feel really uncomfortable when their usual behavior being considered as illegal according to the Google Laws.

You are not allowed to use any library related to .net package which may possibly generate more threads. But in order to make any invocation of other RESTful web services, we need to do something to both satisfy the requirement of Google and us.

In the official document, GAE recommends to use the HttpURLConnection class, which is a little bit too simpler to use, which means it brings various problems when dealing with complicated data types.

Here I will invoke a remote RESTful web service which will accept the format of JSON. Here is the code:

try {
URL url = new URL(”http://localhost:8183/users/” +user+”/content”);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setDoOutput(true);
connection.setRequestMethod(”POST”);
connection.setRequestProperty(”Content-Type”, “application/json”);
String json = content.toJSON().toString();
OutputStreamWriter writer = new OutputStreamWriter(connection.getOutputStream());
writer.write(json);
writer.close();
if (connection.getResponseCode() == HttpURLConnection.HTTP_OK) {
System.out.println(”sssssss”);
} else {
System.out.println(connection.getResponseMessage());
System.out.println(”ffffffff”);
}
} catch (Exception e) {
e.printStackTrace();
}

Looks easy, ha, yep, quite straightforward. Remember to add those content-type in the connection for the receiver to recognize the format you are sending. While you are only allowed to send String or Char[], so hurry up and write your own parsing functions.

  • Chinese Version

    • Wanna know me more? And you can read Chinese? Feel free to visit my Chinese site, more life records will be found there and wish you enjoying it.
  • Thanks for support

  • twitter

    facebook

    linkedin

    • You are currently browsing the archives for the Google App Engine category.

  • Categories