Spatio-temporal Join Query

Extensible operations

Operations in ST-Hadoop are implemented as regular MapReduce programs. The main difference between spatio-temporal operations and spatial operations shipped with SpatialHadoop is that the input file is spatiotemporally indexed. To read a spatio-temporal indexed file, you need to provide the correct ShapeFormat that represent an extension of STPoint. Besides, the regular map and reduce functions, ST-Hadoop allows you to provide a filter function that performs and early pruning step that prunes away file blocks that do not contribute to the final answer based on two property their minimal bounding rectangles (MBRs) and their slicing time span (interval). In this tutorial, you will learn how to use built-in spatio-temporal range query and a step by step instructions on how to write your own spatio-temporal operation.

 

Spatio-temporal Join Query:

One of the basic operations of spatio-temporal data supported in ST-Hadoop is the join query. The join query joins two datasets. The query contains a set of shapes,  a rectangular query area (A), an interval of the query time range (T), a time distance, and space distance. The output is all shapes pairs inside the query region A during time interval T, such that pairs are similar to each other based on space distance and time distance. For example, join dataset X and Y such that find the pair of points that are close to each within 1mile and co-existed within 2 days time interval. 
 
To use spatio-temporal join query you can execute the following command line. Note that both input files must be already indexed using ST-Hadoop Index. 
./hadoop jar st-hadoop-uber.jar stjoin 

/indexDir<Dataset-1>/ /indexDir<Dataset-2> /Result 

rect:x1,y1,x2,y2 interval:t1,t2 
timedistance:td,resolution
spacedistance:d

shape:edu.umn.cs.sthadoop.core.STPoint  -overwrite 

 

Parameters and Command Description
stjoin This is the map reduce method that invokes the spatio-temporal join query. 
indexDir

The HDFS directory where the dataset is being indexed.

The directory represent the parent directory of the hierarchy spatio-temporal index

Result The HDFS output directory where the result will be stored. 
rect: The spatial minimum boundary rectangle area of the spatio-temporal join query. 
interval: The time interval of the spatio-temporal range query. The time should be in the following standard time format: yyyy-MM-dd, such as 2017-01-12.
shape: The shape format of should be extended from STPoint that support spatio-temporal data.
time:

* Optional parameters to intentionally specify a specific resolution layer to query from. For example, if space partitioning technique used in spatio-temporal index then these possible parameters could be passed [ day, week, month, year] 

If property not specified in the execution command line, then ST-Hadoop will decide the execution plan considering all spatio-temporal resolution indices.

timedistance

This is the joining condition based on the distance of time.

This parameter assigned by a pair of <td,resolution>, such as td could be any integer numbers and resolution on of the following (second,minute,day,week,month,year)

For example, timedistance:1,day

spacedistance The join condition based on space distance. Note that "d" value should be any double value in mile. For example, spacedistance:2
-overwrite This flag indicate to overwrite the output directory if it existed. 

 

Contact

We appreciate communicating your feedback, comments, and problems reporting through the email 

[email protected]

ST-Hadoop Team

Recent Tweets