Wednesday, October 3, 2012

Datastore-Google App-engine


Storing data in a scalable web application can be tricky. A user could be interacting with any of dozens of web servers at a given time, and the user's next request could go to a different web server than the previous request. All web servers need to be interacting with data that is also spread out across dozens of machines, possibly in different locations around the world.
With Google App Engine, you don't have to worry about any of that. App Engine's infrastructure takes care of all of the distribution, replication, and load balancing of data behind a simple API—and you get a powerful query engine and transactions as well.
App Engine's data repository, the High Replication Datastore (HRD), uses the Paxos algorithm to replicate data across multiple data centers. Data is written to the Datastore in objects known as entities. Each entity has a key that uniquely identifies it. An entity can optionally designate another entity as its parent; the first entity is a child of the parent entity. The entities in the Datastore thus form a hierarchically structured space similar to the directory structure of a file system. An entity's parent, parent's parent, and so on recursively, are its ancestors; its children, children's children, and so on, are its descendants. An entity without a parent is a root entity.
The Datastore is extremely resilient in the face of catastrophic failure, but its consistency guarantees may differ from what you're familiar with. Entities descended from a common ancestor are said to belong to the same entity group; the common ancestor's key is the group's parent key, which serves to identify the entire group. Queries over a single entity group, called ancestor queries, refer to the parent key instead of a specific entity's key. Entity groups are a unit of both consistency and transactionality: whereas queries over multiple entity groups may return stale, eventually consistent results, those limited to a single entity group always return up-to-date, strongly consistentresults.
The code samples in this guide organize related entities into entity groups, and use ancestor queries on those entity groups to return strongly consistent results. In the example code comments, we highlight some ways this might affect the design of your application. For more detailed information, see Structuring Data for Strong Consistency.
Note: If you built your application using an earlier version of this Getting Started Guide, please note that the sample application has changed. You can still find the sample code for the original Guestbook application, which does not use ancestor queries, in the demos directory of the SDK.

A Complete Example Using the Datastore

Here is a new version of helloworld/helloworld.py that stores greetings in the Datastore. The rest of this page discusses the new pieces.
import cgiimport datetimeimport urllibimport wsgiref.handlers
from google.appengine.ext import dbfrom google.appengine.api import usersfrom google.appengine.ext import webappfrom google.appengine.ext.webapp.util import run_wsgi_app

class Greeting(db.Model):
  """Models an individual Guestbook entry with an author, content, and date."""
  author = db.StringProperty()
  content = db.StringProperty(multiline=True)
  date = db.DateTimeProperty(auto_now_add=True)

def guestbook_key(guestbook_name=None):
  """Constructs a Datastore key for a Guestbook entity with guestbook_name."""
  return db.Key.from_path('Guestbook', guestbook_name or 'default_guestbook')

class MainPage(webapp.RequestHandler):
  def get(self):
    self.response.out.write('<html><body>')
    guestbook_name=self.request.get('guestbook_name')

    # Ancestor queries, as shown here, are strongly consistent; queries that
    # span entity groups are only eventually consistent. If we omitted the
    # ancestor from this query, there would be a slight chance that a greeting
    # that had just been written would not show up in a query.
    greetings = db.GqlQuery("SELECT * "
                            "FROM Greeting "
                            "WHERE ANCESTOR IS :1 "
                            "ORDER BY date DESC LIMIT 10",
                            guestbook_key(guestbook_name))

    for greeting in greetings:
      if greeting.author:
        self.response.out.write(
            '<b>%s</b> wrote:' % greeting.author)
      else:
        self.response.out.write('An anonymous person wrote:')
      self.response.out.write('<blockquote>%s</blockquote>' %
                              cgi.escape(greeting.content))

    self.response.out.write("""
          <form action="/sign?%s" method="post">
            <div><textarea name="content" rows="3" cols="60"></textarea></div>
            <div><input type="submit" value="Sign Guestbook"></div>
          </form>
          <hr>
          <form>Guestbook name: <input value="%s" name="guestbook_name">
          <input type="submit" value="switch"></form>
        </body>
      </html>""" % (urllib.urlencode({'guestbook_name': guestbook_name}),
                          cgi.escape(guestbook_name)))

class Guestbook(webapp.RequestHandler):
  def post(self):
    # We set the same parent key on the 'Greeting' to ensure each greeting is in
    # the same entity group. Queries across the single entity group will be
    # consistent. However, the write rate to a single entity group should
    # be limited to ~1/second.
    guestbook_name = self.request.get('guestbook_name')
    greeting = Greeting(parent=guestbook_key(guestbook_name))

    if users.get_current_user():
      greeting.author = users.get_current_user().nickname()

    greeting.content = self.request.get('content')
    greeting.put()
    self.redirect('/?' + urllib.urlencode({'guestbook_name': guestbook_name}))


application = webapp.WSGIApplication([
  ('/', MainPage),
  ('/sign', Guestbook)
], debug=True)

def main():
  run_wsgi_app(application)

if __name__ == '__main__':
  main()
Replace helloworld/helloworld.py with this, then reload http://localhost:8080/ in your browser. Post a few messages to verify that messages get stored and displayed correctly.
Warning! Exercising the queries in your application locally causes App Engine to create or update index.yaml. If index.yaml is missing or incomplete, you will see index errors when your uploaded application executes queries for which the necessary indexes have not been specified. To avoid index errors in production, always test new queries at least once locally before uploading your application. See Python Datastore Index Configuration for more information.

Storing the Submitted Greetings

App Engine includes a data modeling API for Python. It's similar to Django's data modeling API, but uses App Engine's scalable Datastore behind the scenes.
For the guestbook application, we want to store greetings posted by users. Each greeting includes the author's name, the message content, and the date and time the message was posted so we can display messages in chronological order.
To use the data modeling API, import the google.appengine.ext.db module:
from google.appengine.ext import db
The following defines a data model for a greeting:
class Greeting(db.Model):
    author = db.StringProperty()
    content = db.StringProperty(multiline=True)
    date = db.DateTimeProperty(auto_now_add=True)
This defines a Greeting model with three properties: author whose value is a string, content whose value is another string, and date whose value is adatetime.datetime.
Some property constructors take parameters to further configure their behavior. Giving the db.StringProperty constructor the multiline=True parameter says that values for this property can contain newline characters. Giving the db.DateTimeProperty constructor a auto_now_add=True parameter configures the model to automatically give new objects a date of the time the object is created, if the application doesn't otherwise provide a value. For a complete list of property types and their options, see the Datastore reference.
Now that we have a data model for greetings, the application can use the model to create new Greeting objects and put them into the Datastore. The following new version of the Guestbook handler creates new greetings and saves them to the Datastore:
class Guestbook(webapp.RequestHandler):
    def post(self):
      guestbook_name = self.request.get('guestbook_name')
      greeting = Greeting(parent=guestbook_key(guestbook_name))

      if users.get_current_user():
        greeting.author = users.get_current_user().nickname()

      greeting.content = self.request.get('content')
      greeting.put()
      self.redirect('/?' + urllib.urlencode({'guestbook_name': guestbook_name}))
This new Guestbook handler creates a new Greeting object, then sets its author and content properties with the data posted by the user. The parent has an of entity kind Guestbook. There is no need to create the Guestbook entity before setting it to be the parent of another entity. In this example, the parent is used as a placeholder for transaction and consistency purposes. See the Transactions page for more information. Objects that share a common ancestor belong to the same entity group. It does not set the date property, so date is automatically set to "now," as we configured the model to do.
Finally, greeting.put() saves our new object to the Datastore. If we had acquired this object from a query, put() would have updated the existing object. Since we created this object with the model constructor, put() adds the new object to the Datastore.
Because querying is strongly consistent only within entity groups, we assign all of a book's greetings to the same entity group in this example by setting the same parent for each greeting. This means a user will always see a greeting immediately after it was written. However, the rate at which you can write to the same entity group is limited to 1 write to the entity group per second. When you design a real application you'll need to keep this fact in mind. Note that by using services such as Memcache, you can mitigate the chance that a user won't see fresh results when querying across entity groups immediately after a write.

Retrieving the Stored Greetings With GQL

The App Engine Datastore has a sophisticated query engine for data models. Because the App Engine Datastore is not a traditional relational database, queries are not specified using SQL. Instead, you can prepare queries using a SQL-like query language we call GQL. GQL provides access to the App Engine Datastore query engine's features using a familiar syntax.
The following new version of the MainPage handler queries the Datastore for greetings:
class MainPage(webapp.RequestHandler):
    def get(self):
        self.response.out.write('<html><body>')
        guestbook_name=self.request.get('guestbook_name')

        greetings = db.GqlQuery("SELECT * "
                                "FROM Greeting "
                                "WHERE ANCESTOR IS :1 "
                                "ORDER BY date DESC LIMIT 10",
                                guestbook_key(guestbook_name))


        for greeting in greetings:
            if greeting.author:
                self.response.out.write('<b>%s</b> wrote:' % greeting.author)
            else:
                self.response.out.write('An anonymous person wrote:')
            self.response.out.write('<blockquote>%s</blockquote>' %
                                    cgi.escape(greeting.content))

        # Write the submission form and the footer of the page
        self.response.out.write("""
              <form action="/sign" method="post">
                <div><textarea name="content" rows="3" cols="60"></textarea></div>
                <div><input type="submit" value="Sign Guestbook"></div>
              </form>
            </body>
          </html>""")
The query happens here:
    greetings = db.GqlQuery("SELECT * "
                            "FROM Greeting "
                            "WHERE ANCESTOR IS :1 "
                            "ORDER BY date DESC LIMIT 10",
                             guestbook_key(guestbook_name))
Alternatively, you can call the gql(...) method on the Greeting class, and omit the SELECT * FROM Greeting from the query:
    greetings = Greeting.gql("WHERE ANCESTOR IS :1 ORDER BY date DESC LIMIT 10",
                             guestbook_key(guestbook_name))
As with SQL, keywords (such as SELECT) are case insensitive. Names, however, are case sensitive.
Because the query returns full data objects, it does not make sense to select specific properties from the model. All GQL queries start with SELECT * FROM model (or are so implied by the model's gql(...) method) so as to resemble their SQL equivalents.
A GQL query can have a WHERE clause that filters the result set by one or more conditions based on property values. Unlike SQL, GQL queries may not contain value constants: Instead, GQL uses parameter binding for all values in queries. For example, to get only the greetings posted in the past seven days:
greetings = Greeting.gql(
            "WHERE ANCESTOR IS :1 AND date > :2 ORDER BY date DESC",
            guestbook_key(guestbook_name),
            datetime.datetime.now() + datetime.timedelta(days=-7))
You can also use named parameters instead of positional parameters:
greetings = Greeting.gql("WHERE ANCESTOR = :ancestor AND date > :date ORDER BY date DESC",
                         ancestor=guestbook_key(guestbook_name),
                         date=datetime.datetime.now() + datetime.timedelta(days=-7))
In addition to GQL, the Datastore API provides another mechanism for building query objects using methods. The query above could also be prepared as follows:
greetings = Greeting.all()
greetings.ancestor(guestbook_key(guestbook_name))
greetings.filter("date >",
                 datetime.datetime.now() + datetime.timedelta(days=-7))
greetings.order("-date")
For a complete description of GQL and the query APIs, see the Datastore reference.

A Word About Datastore Indexes

Every query in the App Engine Datastore is computed from one or more indexes. Indexes are tables that map ordered property values to entity keys. This is how App Engine is able to serve results quickly regardless of the size of your application's Datastore. Many queries can be computed from the builtin indexes, but the Datastore requires you to specify a custom index for some, more complex, queries. Without a custom index, the Datastore can't execute the query efficiently.
Our guest book example above, which filters by guestbook and orders by date, uses an ancestor query and a sort order. This query requires a custom index to be specified in your application's index.yaml file. When you run your application in the dev_appserver, the SDK will automatically add an entry to this file. When you upload your application, the custom index definition will be automatically uploaded, too. The entry for this query will look like:
indexes:
- kind: Greeting
  ancestor: yes
  properties:
  - name: date
    direction: desc
You can read all about Datastore indexes on the Datastore Indexes page.

Clearing the Development Server Datastore

The development web server uses a local version of the Datastore for testing your application, using temporary files. The data persists as long as the temporary files exist, and the web server does not reset these files unless you ask it to do so.
If you want the development server to erase its Datastore prior to starting up, use the --clear_datastore option when starting the server:
dev_appserver.py --clear_datastore helloworld/

Next...

We now have a working guest book application that authenticates users using Google accounts, lets them submit messages, and displays messages other users have left. Because App Engine handles scaling automatically, we will not need to revisit this code as our application gets popular.
This latest version mixes HTML content with the code for the MainPage handler. This will make it difficult to change the appearance of the application, especially as our application gets bigger and more complex. Let's use templates to manage the appearance, and introduce static files for a CSS stylesheet.

No comments:

Post a Comment