Cache in AppEngine

To correctly set the cache for Google App Engine looks like an easy task, but in order to fully understand the story behind the scene turns out to be a little bit tricky as there are some hidden points that only can be discovered from various posts that scattered in the cloud.

Some highlighted points here which I have learned from the research:

1) For static files, the cache is being set in the appengine-web.xml file, you can specify how long it is going to expire within the file.
The cache here is if you don’t declare your static file here, then quite possibly you will not be able to render your static file well. Google use a different source to serve static files, as a result you have to explicitly tells AppEngine that “these are the types of static files I am using, and you need to serve it”. An alternative is to add those static files into your web.xml as a servlet or a filter, which will then be caught by the servlet/filter chain, however, this is not the recommended way to go.

2) You can use a filter to add “cache-control” header to static files request/response, but Google will not honor it. To be precisely, Google will always return a private cache-control for your static files regardless what kind of setting you provided. Hence these static files will only available in your private cache pool (local browser), and will expire after your specified length.

3) However, we can use above mentioned method to add “cache-control” header to dynamic generated pages, such as jsp file. This is particularly useful for files that rarely change, such as front-page or some introduction pages. And this time, Google will honor your setting in your cache-control header with the right expire time.

4) The way current browser working is a little bit weird but understandable:
If you click a link on a page, then browser will always first check if there is a local cache available, if not then goes to the server to fetch.
If you press F5/Refresh the page, the the browser will just ignore your local browser cache at all, but goes directly to the server. This time, it will use the “is-modified-since” header to compare whether the content on the server side is still valid, if it is, then a 304 will return and you will still use your local cache, otherwise a new version will be retrieved.

In short, if you click the link, then it is possible that client will not talk to the server at all, but if you refresh, then the client will always talk to the server.

5) Both Chrome and Firefox have good tools for debugging HTTP Request, see how the cache is working and diagnose each of the specific request.

CloudFront Streaming Exception: NetStream.Play.StreamNotFound

While I am doing some work with CloudFront streaming using S3 bucket as origin, I get some weird exception “NetStream.Play.StreamNotFound” randomly. It doesn’t happen on all files, but just some certain files. Initially, I thought there is something wrong with my encoding configuration which leads to some inconsistency between bitrate to result a failure of stream rendering, but even I tried out with some of the most popular service online, it still doesn’t work. And it is only applied to streaming protocols, for normal HTTP protocol it is still working perfectly.

After struggling with it more than one day, I eventually found out the culprit of this failure is because of the file extension.

The rule is: you can not make the file extension in CAPITAL/UPPERCASE/BIG case, otherwise CloudFront won’t recognize those files thus failed to streaming it.

So, if you want to play with CloudFront streaming, make sure that your file is not xxx.FLV or xxx.MP4, but please name it xxx.flv or xxx.mp4.

Change VM Allocation for Intellij Idea

This method is suitable for at least the current version I’m using: 10.5.2, not sure about earlier version but should have no problem in Idea 9. I have also tried on the EAP of 11 without any issues.

Basically, what you need to do is modify the memory configurations in the file:

{IDEA_HOME}/bin/idea.vmoptions

Change the the value to the size you wish to have. In my case, I update the -Xmx to 1536M and -XX:MaxPermSize=500M, which speed up the process and responsiveness of Idea quite a bit.

Session Clean in Google App Engine

It is always sad that with such a good product, the documents always lacking behind, sometimes even years. While a lot of information can be retrieved by using its mother Google, still a formal doc and some practical hint will definitely be helpful than those articles scattered in the sea.

Anyway, to clean the session in the AppEngine database is a basic requirement for any real product running on it. If you don’t clean it, unless you are not using session anywhere, there is definitely one day your datastore will blow out.

Once you have a session hanging around, you will easily detect them under the Datastore Viewer. It is actually a table called _ah_session. (Yep, whenever you see _ah, it means Google is playing some magic background stuff). It will list all current sessions in the app and its expire date as well as the size of it (well, actually it’s the value, but most of the time, the value is not easy to be displaying there)

Apparently, the most direct way and easiest way is using your hand, tick all those boxes before the sessions, and click the DELETE button, and then yep, all sessions just cleared. Easy right, what if you have 100 sessions? Some one may argue that, you can append &limit=SOME_MAGIC_MAXIMUM_NUMBER_TO_BE_DISPLAYED at the end of the link in the browser, refresh the page, and then clear them all at once. But as far as I know, the maximum number you can set is 200, which means if there are 1000 entries, then you have to go through all 5 pages.

This manual process sounds boring and lack of the self satisfactory as a software developer, don’t you think? As a programmer, we always look down those guys who has to use mouse and keyboard instead of using their mind to delete stuff. All right, so Google does provides a way to clean the session, it is a servlet, part of the Google Code infrastructure (Being aware of Google’s BigTable). Basically what you need to do is declare the servlet: SessionCleanupServlet in your web.xml, and get a cron job to call it periodically based on your traffic.

This article explains well how to integrate the servlet and cron job into your app to delete the sessions. But you should be aware that each of the servlet run can only remove 100 sessions at most, that means if you have a huge traffic and huge sessions, you have to set the servlet/cron job runs more frequently but also make sure you won’t interrupt any current user’s activities.

OK, it seems all working well except some bugs may happen. One of the issues we have detect is that we have some very very very old sessions being around, maybe one year (Before we include the clean session mechanism), and obviously no matter how many times of the session being cleaned by the servlet, those sessions will always survive from all bullets. While, we have no explaination about it and they are always so trivial to be neglect until a 500 error comes up when a new version being released.

One bug we recently have is after we change one of the property in a class type from String to Date, and accordingly some of the related methods are being updated as well. But those ghost sessions still have String in their hearts, and when app initialized, everything just crashed because of this catch. Consequently, we are thinking about maybe it is worthwhile to clean the cache whenever a new version being released, and this cannot being relied on the servlet provided by Google, but rather our own code to delete all of them. This thread elegantly handles above problem by deleting all sessions and caches using a customized code, and you can now have a clean environment whenever you want.

So once you have integrated above two points into your projects, you should no longer worry about any session issues, and hopefully Google will someday bring us a better built-in solution to handle all this issues. For now, let’s just work harder.

Google App Engine: Found a jar file too large to upload

When deploying your application into the server, sometimes you may encounter this issue that if you have a really big JAR. This is not a common case before the release of 1.5.4, however, the latest appengine-tools-sdk-1.5.4.jar is about 9.7MB which exceeds the limit a little bit. I’m not sure if you use Eclipse’s plugin to upload your application will be OK or not, but definitely when you try to use the maven plugin, an error will happen to notify you that the jar is too large to upload.

The fix for using standard upload is simple that add the provided parameter to the end of the command is fine.

For maven part, what you need to modify is your pom.xml:

<plugin>
<groupId>net.kindleit</groupId>
<artifactId>maven-gae-plugin</artifactId>
<version>${maven.gae.plugin.version}</version>
<configuration>
<splitJars>true</splitJars>
</configuration>
<dependencies>
<dependency>
<groupId>net.kindleit</groupId>
<artifactId>gae-runtime</artifactId>
<version>${gae.version}</version>
<type>pom</type>
</dependency>
</dependencies>
</plugin>

The one with the red color is the thing that you need to add, are maven plugin will use the split jar pattern to upload the application.

SQL: Turn multiple rows into one row multiple columns

We have a table looks like this:
Name Question Value
Alan        1           5
Alan        2           4
Alan        3           6
Jim          1           4
Jim          2           3
Jim          4           5

We would like to select out a result looks like this:
Name    Q1     Q2    Q3    Q4
Alan       5        4       6      null
Jim          4       3       null   5

This problem seems to be very easy, however, cost me a lot of time to figure out how to proceed. Finally, after some googling a nice post introduce a nice way to handle similar issues using MAX(DECODE()) function.

Basically, you first need to utilise the decode() function. decode() acts like if-else-then. For example, decode(name, ‘Alan’, ‘True’, ‘False’) works like if the name is ‘Alan’ then return ‘True’, otherwise return ‘False’.

In our problem here, we use decode() to distinguish the answers from different questions: decode(question, 1, value, NULL) as Q1,   decode(question, 2, value, NULL) as Q2, … In this way we would pick out the value of each question.

After we can get the answer to each question, what we need to do is group the answers by different names. That’s why we need max() here to act for the group by clause. Since there is only one value here for each question, so max(), min() or some other reasonable functions are all working well here.:

select name AS NAME, max(decode(question, 1, value, NULL)) AS Q1, max(decode(question, 2, value, NULL)) AS Q2, max(decode(question, 3, value, NULL)) AS Q3, max(decode(question, 4, value, NULL)) AS Q4 from table group by name.

Cannot restart LDAP server on Ubuntu

I did some changes to the ldif file yesterday, and cannot start my LDAP server any more today. I tried different ways, but all unfortunately failed.

Actually the way to workaround is quite easy, that you simply remove the folder slapd.d/ under /etc/ldap, and reinstall the ldap again, ldap will work again.

Intellij IDEA 10.5 stuck in the loading process

Intellij IDEA is a really smart IDA that attracts more and more developers to its community. However, a lot of bugs are still hanging there which annoy people too much.

One of the bugs is when you start IDEA with a pretty large project, there is a highly chance that your loading process will stuck there forever. This has been identified as a bug in IDEA-67401, and hasn’t been resolved yet now.

According to that issue page, there is a workaround to load the project. That is to disable the ‘Tip of the Day’ and ‘Productivity Guide’ prompt when start a new project or open IDEA.

Besides, there is another possibility that you cannot even get in to adjust your setting before it get stuck. If that is the case, just remember to be very quick when you open IDEA, as soon as the loading project prompt pop up, cancel the loading, as well as all the next loading processes. Once the loading started, you will not be able to cancel it again, and have to kill the IDE and restart again.

Oracle: To convert a row into a column (CROSS-JOIN)

This afternoon, while I am doing some Jasper report stuff, I am puzzled by a SQL query that which takes quite a long time to execute. As the time goes, Jasper will easily get a timeout and a blank screen is waiting for me after retrieving data for about 120 seconds.

As it is not appropriate to use the original data table as the example to describe here, I tried to make up a similar scenario which should explain the cause and the solution sufficiently.

Given a table A, with five columns — Primary Key (pk), Statistics 1A (s1a), Statistics 1B (s1b), Statistics 2A(s2a), Statistics 2B(s2b), we need to select the data out into a form that looks like — Primary Key (pk), Statistics A(s1a or s2a), Statistics B(s2b or s2b). Essentially, it is trying to convert a row into one column but two rows.  Someone may prefer to call it pivot query by the way.

Initially, naively I think the most easiest and intuitive way to do it is by two union queries. That is
select pk, s1a, s1b from table where *** union select pk, s2a, s2b from table where ***.
When the data set is small, and it only depends on the single table, it is fine. However, in my case, the data set is huge, and the union will not be two but eight, and more importantly every union section will consists of another 6 inner join tables. That’s the reason why Jasper cannot retrieve what it needs on time.

In this link, as suggested by Scott, introduced a very smart way to handle this issue. What it uses is called: cross join. So basically, we use the existing table to cross join with the different type/criteria to return the expected result.

select
pk,
case
when ite = ’s1′ then s1a
when ite = ’s2′ then s2a
end as sa
case
when ite = ’s1′ then s1b
when ite = ’s2′ then s2b
end as sb
from
(
select pivoter.ite,
s1a, s1b, s2a, s2b
from
table
cross join (
select ’s1′ as ite from dual
union all
select ’s2′ as ite from dual
) pivotel
)

By using the above code, the problem will be solved by the cross join union columns.

Miscellaneous Points for Oracle Join Query

As a software developer, you must have the experience to deal with all sorts of join queries. And I believe everyone has once or still now struggle with all these different terms about inner, outer, left, right, and etc. As I have spent roughly two hours these morning, to summarize some of the points that I am easily to forgotten and got wrong, it’s a good chance to write it done in case later I need to test my memory.

1) What is the difference of join…on… grammar and little (+) sign?

Oracle used to only support (+) in old days, which is also kind of created by Oracle. Later, as the ANSI formalized the standard for join grammar, Oracle adopted both for join queries. As a result, it won’t be a surprise if you see both two styles in your project which comes from different developers.

2) Any difference between inner join and join?

No.

3) Why do we need outer join?

Generally, inner join will return result sets that have a match in both Table A and B. However in reality sometimes, we also need results to be returned even the matching is null. That’s the place where outer join shows its ability. There are plenty of tutorials available to discuss about the outer join. What I want to mentioning here is the null value will only applied to the join’s destination table, but not the starting table.

4) Any more differences?

Yes. Actually there are two more I want to emphasize:

1. ANSI style supports outer join, which you can google what does it mean. But traditional Oracle doesn’t directly support it. ( By saying it directly, I mean there always exists workarounds, and various workarounds)
2. One of the most important feature to differentiate the two is the ANSI style separate the join condition and query filter criteria, which is much more nice tidy and clean. (Is it, at least I think so). Meanwhile, you will also be able to avoid a lot of caveats that may come with you when you go for the traditional style. See Common errors seen when using OUTER-JOIN.

Older Entries
  • Chinese Version

    • Wanna know me more? And you can read Chinese? Feel free to visit my Chinese site, more life records will be found there and wish you enjoying it.
  • Thanks for support

  • twitter

    facebook

    linkedin

  • Categories