New ipsojobs summer features

The summer is already here, almost in Barcelona, today is better to stay in your workplace with the AC than in the street.

For that reason we’ve been working so hard in new features :)

Expiration Warnings:

We’ve introduced new warnings when your job posting is about to expire. To use this new feature simply introduce a  job posting and then you’ll be asked to fill your contact email.

We’ll send you a link to renew the job posting (if you want) near the end of the publishing period.

When you renew the job posting, you can optionally introduce again an email for a further expiration warning.

Better SEO in Pagination:

The pagination for big cities have become a little bulky, for that reason we’ve simplified the pagination and added a more descriptive title in the links (better than 1, 2, 3 …)

Ultra fast, spam detection for admins:

We’ve set up a new ultra-fast way to remove spammy job postings, we’ll send an email to all the admins telling the secrets of this new feature.

Comments

Out of the cloud

After a few days running a simple CDN in Google App Engine, we’re forced to turn back and wait until Google App Engine are more “mature”.

As you can see in the following image the Google App Engine have got lots of trouble in this June and we cannot afford to lose customers.

For that reason we are now serving again our static content from our servers until the situation in Google App Engine normalizes.

You can follow the google app engine downtimes here

Thank you

Comments

How to create a simple but powerful CDN with Google App Engine (GAE)

The main purpose when I started to look at Google App Engine (3 days ago) was to use it as a “CDN for the rest of us”, a way to cache static content (initially) and have this content distributed along all the infrastructure of Google (maybe the most powerful cloud rigth now)

What we want?:

  • Create a CDN easy to update and free of charge for static resources (images, css, js)
  • Consume as less bandwidth as possible leveraging the If-Modified-Since/Last-Modified/304 Not Modified model

Hands-on:

The first approach, of course, was to look on Google for some help, the post of Andreas Krohn helped a lot to start.

But I want to go further and take care of modern browsers If-Modified-Since requests, then the google framework and a little of Python comes to the rescue.

Note: I’m assuming you’ve already installed the Python environment and the Google App Engine SDK

First of all let me give you two little .bat files that are useful:

Start the test webserver (test.bat):
dev_appserver.py c:\ipsojobscloud

Upload your application to the cloud (update.bat):
appcfg.py update c:\ipsojobscloud

Note: simply change c:\ipsojobscloud for the folder you are working in and contains your app.yaml

Then I’ve setup the app.yaml, it’s very simple (16 lines):

application: ipsojobscloud
version: 1
runtime: python
api_version: 1

handlers:
- url: /favicon.ico
  static_files: favicon.ico
  upload: favicon.ico

- url: /images/favicon.ico
  static_files: favicon.ico
  upload: favicon.ico

- url: /.*
  script: cacheheaders.py

This app.yaml simply tells the GAE the name of the application (ipsojobscloud) the version we’re working on (use only the major release number, GAE automatically takes care of the .x when you upload).

Then we specify two handlers for the favicon.ico static file and a catch-all handler that redirects our requests to the Python script cacheheaders.py

With that environment set, we simply code the cacheheaders.py file, let’s see it in detail:

The skeleton of the file is:

import wsgiref.handlers
from google.appengine.ext import webapp

class MainPage(webapp.RequestHandler):

  def get(self, dir, file, extension):
...

def main():
  application = webapp.WSGIApplication([(r'/(.*)/([^.]*).(.*)’, MainPage)], debug=False)
  wsgiref.handlers.CGIHandler().run(application)

if __name__ == “__main__”:
  main()

Here we are importing the webapp framework and setting the class MainPage, in the main section the only change in the sample GAE is
the regular expression that we used to match our requests, the expression r’/(.*)/([^.]*).(.*)’ is telling that we are using regular expressions (r)
, then take one slash, followed by an arbitray number of characters and another slash /(.*)/ the parentesis tells the regular expression to keep the string beetween the two slashes as a variable. The next part ([^.]*). takes all caracters except a dot and puts them in to the second variable and finally, we’ll take the rest of the input as a variable with (.*)

This regular expression is designed to only capture paths like /images/helloworld.gif where variables are images, helloworld and gif respectively

Note: Of course that’s not a complete solution, we can only have one folder depth, but it’s a good readers exercice to improve that :)

The part that you need to know is that when a request arrives it’s mapped to the get function with the parameters dir, file and extension (and don’t forget the first “self” parameter)

Let’s see the code of the get function in detail:

First, check the validity of the parameters received and set the correct content-type based on the extension:

  def get(self, dir, file, extension):
    if (dir!='js' and dir!='css' and dir!='images'):
      self.error(404)
      return

    if (extension!='js' and extension!='css' and extension!='jpg' and extension!='png' and extension!='gif'):
      self.error(404)
      return

    if extension=='js':
      self.response.headers['Content-Type'] = ‘application/x-javascript’
    elif extension==’css’:
      self.response.headers['Content-Type'] = ‘text/css’
    elif extension==’jpg’:
      self.response.headers['Content-Type'] = ‘image/jpeg’
    elif extension==’gif’:
      self.response.headers['Content-Type'] = ‘image/gif’
    elif extension==’png’:
      self.response.headers['Content-Type'] = ‘image/png’

Note: the firts two ifs are completely optional, we check if the dir variable is in our valid list of dirs (js, css, images) and if the extension of the file is in our allowed list (js, css, jpg, png, gif), you have to change that check or completely remove it at your convenience.

And now the tricky part:

    try:
      import os
      import datetime
      path = dir+'/'+file+"."+extension
      info = os.stat(path)
      lastmod = datetime.datetime.fromtimestamp(info[8])
      if self.request.headers.has_key(’If-Modified-Since’):
        dt = self.request.headers.get(’If-Modified-Since’).split(’;')[0]
        modsince = datetime.datetime.strptime(dt, “%a, %d %b %Y %H:%M:%S %Z”)
        if modsince >= lastmod:
        # The file is older than the cached copy (or exactly the same)
          self.error(304)
          return
        else:
        # The file is newer
          self.output_file(path, lastmod)
      else:
        self.output_file(path, lastmod)
    except:
      self.error(404)
      return

First we import some packages (os, datetime), then create a variable “path” with the full path of the file we want to retrieve

path = dir+'/'+file+"."+extension

Then, take the info of the file from the Operating System and keep the last modified date into lastmod variable, note that if an error occurs (non existing file for example, the except part will be executed, returning a 404 not found response to the browser).

In the following lines we scan the headers of the request, looking for an If-Modified-Since header, if we found it take the date part

      if self.request.headers.has_key('If-Modified-Since'):
        dt = self.request.headers.get('If-Modified-Since').split(';')[0]
        modsince = datetime.datetime.strptime(dt, “%a, %d %b %Y %H:%M:%S %Z”)

Then compare the last modification date of the file against the ifmodifiedsince date and act accordingly, note that self.error(304) will return a response code 304 (Not-Modified) to the browser:

        if modsince >= lastmod:
        # The file is older than the cached copy or the same
          self.error(304)
          return
        else:
        # The file is newer
          self.output_file(path, lastmod)

The self.output_file(path, lastmod) is a function we have defined to avoid code duplication:

  def output_file(self, path, lastmod):
    import datetime
    try:
      self.response.headers['Cache-Control']=’public, max-age=31536000′
      self.response.headers['Last-Modified'] = lastmod.strftime(”%a, %d %b %Y %H:%M:%S GMT”)
      expires=lastmod+datetime.timedelta(days=365)
      self.response.headers['Expires'] = expires.strftime(”%a, %d %b %Y %H:%M:%S GMT”)
      fh=open(path, ‘r’)
      self.response.out.write(fh.read())
      fh.close
      return
    except IOError:
      self.error(404)
      return

As you can see we imported datetime to manipulate dates and try to do the following:

  • Set the header Cache-Control, to be as much cacheable as posible
  • Set the header Last-Modified (IMPORTANT ! when we send for the first time the file to the browser it keeps the Last-Modified date of the file, this value is the value that will send in the next If-Modified-Since requests, when we usually will respond 304 not-modified!)
  • Calculate an expires date in the future (we’ve put 365 days)
  • Set the Expires header with this value (last-modified+365 days)
  • Open the file and send it to the output and finally close the file
  • return, because when we output the file we’re done

Note: If something happens we returned an standard response of Not Found (404)

Conclusions:

We’ve improved the latency in the requests of static files putting them into the cloud, and keep the bandwidth used in the cloud to a minimum answering correctly to the If-Modified-Since requests and only in about 70 lines of code

One of the advantatges of Google App Engine above Amazon S3 is that GAE is free up 5 million page views a month, that give us a good chance to try this kind of features without spending cash.

You can see the speed improvement on-line in all the ipsojobs.com pages rigth now !

Some screenshots taken from firebug:

First request:

First request (not cached)

Second request:

Second request, cached, note the 304 responses

Detail of a request:

Sample cached response, details

Full source of cacheheaders.py:

import wsgiref.handlers
from google.appengine.ext import webapp

class MainPage(webapp.RequestHandler):

  def output_file(self, path, lastmod):
    import datetime
    try:
      self.response.headers['Cache-Control']=’public, max-age=31536000′
      self.response.headers['Last-Modified'] = lastmod.strftime(”%a, %d %b %Y %H:%M:%S GMT”)
      expires=lastmod+datetime.timedelta(days=365)
      self.response.headers['Expires'] = expires.strftime(”%a, %d %b %Y %H:%M:%S GMT”)
      fh=open(path, ‘r’)
      self.response.out.write(fh.read())
      fh.close
      return
    except IOError:
      self.error(404)
      return

  def get(self, dir, file, extension):
    if (dir!=’js’ and dir!=’css’ and dir!=’images’):
      self.error(404)
      return

    if (extension!=’js’ and extension!=’css’ and extension!=’jpg’ and extension!=’png’ and extension!=’gif’):
      self.error(404)
      return

    if extension==’js’:
      self.response.headers['Content-Type'] = ‘application/x-javascript’
    elif extension==’css’:
      self.response.headers['Content-Type'] = ‘text/css’
    elif extension==’jpg’:
      self.response.headers['Content-Type'] = ‘image/jpeg’
    elif extension==’gif’:
      self.response.headers['Content-Type'] = ‘image/gif’
    elif extension==’png’:
      self.response.headers['Content-Type'] = ‘image/png’

    try:
      import os
      import datetime
      path = dir+’/'+file+”.”+extension
      info = os.stat(path)
      lastmod = datetime.datetime.fromtimestamp(info[8])
      if self.request.headers.has_key(’If-Modified-Since’):
        dt = self.request.headers.get(’If-Modified-Since’).split(’;')[0]
        modsince = datetime.datetime.strptime(dt, “%a, %d %b %Y %H:%M:%S %Z”)
        if modsince >= lastmod:
        # The file is older than the cached copy (or exactly the same)
          self.error(304)
          return
        else:
        # The file is newer
          self.output_file(path, lastmod)
      else:
        self.output_file(path, lastmod)
    except:
      self.error(404)
      return

def main():
  application = webapp.WSGIApplication([(r'/(.*)/([^.]*).(.*)’, MainPage)], debug=False)
  wsgiref.handlers.CGIHandler().run(application)

if __name__ == “__main__”:
  main()

Comments (4)

Put the latest job offers in your web site, the easy way

Again, google’s folks come out with a great and simple service, it’s called:

Google AJAX Feed API

That\'s what we want to obtain

The idea is simple, just put some keywords, google will look for relevant RSS feeds and generate a simple but elegant RSS slideshow with the latest “news” contained in those feeds.

How can help you to put the latest job offers of ipsojobs.com in your website?

EASY
Just go to http://www.google.com/uds/solutions/wizards/dynamicfeed.html

In the Style, check the best suited for your site (if you choose “Vertical Stacked” be sure to choose an appropiate title too)

In the “Feeds Expression” enter your favorite ipsojobs sites, for example:

ipsojobs barcelona, ipsojobs sabadell

The comma, separates the cities and it’s important to put ipsojobs every time to be sure that google takes the correct RSS

When you click the Preview button be sure that in the “Direct Feed URL” are appearing valid ipsojobs RSS feeds, like in the example:

ipsojobs barcelona: http://barcelona.ipsojobs.com/rss/controller.php
ipsojobs sabadell: http://sabadell.ipsojobs.com/rss/controller.php

Finally, go to Generate Code and copy and paste it in to your favorite CMS.

The sample, visually

That’s what we have done

The example integrated in a Blog

Sample integration in a blogger page

Comments

Latest news of ipsojobs.com

We’ve been very busy in ipsojobs in the last weeks.

So busy that we don’t have time to post :)

Let us fire some facts:

  • The traffic is growing steadily giving our administrators more ad cash every month.
  • Alexa has changed the measuring system, taken us off the top 100.000 temporarily, now, we’re back and some days we’re reaching the top 50.000 mark, that’s great with the new system!
  • We’ve opened Moscow and Sankt-Peterburg, in Russia and the city administrator of those cities has made the Russian translation of ipsojobs.com
  • We’re starting to integrate broadbean offers in our site giving us a headstart in the UK market. We hope this agreement will start to give results in the following weeks.
  • ipsojobs.com have reached the 11,000 active jobs mark, that’s a big milestone for us !
  • We’ve done some minor SEO adjustments, basically putting the word “Jobs” or “Trabajos” in some important links, between cities and to the worldwide home page.
  • At the technical level we’ve now two dedicated servers, one for the database and one for the frontend and this give us enough power to reach a 10-fold traffic grow.
  • And we also added some anti-spam features to keep the quality of all the ipsojobs.com cities as good as always

More news to come
The ipsojobs.com team

Comments

We’ve got NEW HOME

Hi all,

we’ve improved the ipsojobs.com home page, showing the more active cities in a pretty “tag cloud”, every ipsojobs zone have it’s own metrics of what is relevant and decides the font-size depending on the current active job postings in the city relative to the area.

Here the first screenshot of the new home:

New Home screenshot

We pursue three objectives:

  1. Promote the competence between the city managers in the same area
  2. Save screen space, moving the inactive or less active cities to the bottom
  3. Show most active cities with a premium size

Go to see the new home of ipsojobs.com NOW!

Comments

Happy ipsojobs.com half-aniversary

Last sunday was the 6 months half-aniversary of ipsojobs.com and we are very excited to share with all our managers, partners and users the good health of the site. We think that the following months will be better than the first 6 ones and we expect a high growth in traffic, cities and job offers all over the world.

Let’s see the numbers of the first 6 months, compared with the 3 months data in parenthesis:

Total number of published job offers: 14.709 (6.563)
Currently active job offers: 4.645 (2.395)
Average new job offers by day: 81,7 (72,9)
Active job sites (cities): 282 (219)
Number of city managers: 160 (149)
PageRank: 4 (4)
Alexa Ranking: 62.204 (56.540)
Partners: 8 (3)

Thanks to all our partners, city managers and users. You make ipsojobs.com a success job board site !

Our partners:

Comments

Ipsojobs and Recruit.net integration

Recruit.net page showing Ipsojobs.com job offersIpsojobs.com is proud to annunce the inclusion in the recruit.net search index.

Recruit.net is the leader in vertical job search in Australia, New Zealand, China, Malaysia, India, Japan and Singapore. This new vertical search engine integration will help Ipsojobs.com to be more popular in those areas.

We are receiving visitors from recruit.net since February the 19th (19/02/2008). Welcome all !

Comments

Ipsojobs and SnipTime integration

ScreenShot of a sniptime/ipsojobs integration

Ipsojobs.com is proud to annunce the inclusion in the sniptime search index.

Sniptime is one of the most important players in the job search market in Spain, this new vertical search engine integration will help Ipsojobs.com to be more popular and more relevant in the Spanish market.

We are receiving visitors from sniptime.com since February the 21th (21/02/2008). Welcome all !

Comments

Ipsojobs and SimplyHired integration

Search result of simplyhired.com showing an ipsojobs.com job offer

Ipsojobs.com is proud to annunce the inclusion in the simplyhired search index.

SimplyHired is one of the most important players in the job search market in the US, this new vertical search engine integration will help Ipsojobs.com to be more popular and more relevant in the US market.

We are receiving visitors from simpyhired.com since February the 19th (19/02/2008). Welcome all !

Comments

« Previous entries

© Omatech