Uploads (aka Imports, aka Batch Processing)

POSTing to /rest/v1/upload/ allows you to send many actions for processing to ActionKit in a single request. The actions will be processed using the same code and rules as User Imports in the ActionKit admin.

You should familiarize yourself with the details of User imports before using this API endpoint.

This documentation covers using uploads from the API - requirements for uploading import files, the meaning of status values in the returned data, how to track progress, how to access warnings and errors, and how to stop an in-process upload.

Format Your Upload File

Your file must:

  • Be formatted as a TSV or CSV.
  • Include a header row with correct field names.
  • Include a column that identifies the user. The choices are "email", "user_id" or "akid".
  • Be saved in the UTF-8 encoding.
  • Specify times in UTC.

Large files should be compressed using gzip or Zip compression to reduce upload times.

See the User Import documentation for full details of the formatting your import file.

Create A Multipart POST Request

How you create a correctly formatted multipart POST will depend on how you are connecting to ActionKit. ActionKit relies on Django's HttpRequest class to parse the incoming POST, which expects a standard http://tools.ietf.org/html/rfc2387 style HTTP POST request.

All parameters must be sent in the POST payload, parameters in the query string will be ignored.

The request must contain at least two parameters:

  • upload, the file to be processed
  • page, the name of the ImportPage to use for the actions.

The request can also include:

  • autocreate_user_fields, create allowed custom user fields (i.e. "user_xxx") if they don't exist; must be 'true' or 'false'; defaults to 'false'.

By default, we'll raise an error if you send a user field that isn't yet created as an allowed field in your instance. Use this parameter to automatically create those allowed user fields.

  • user_fields_only, if 'true', we will attempt a fast custom user field upload. Note that your upload must only contain a user-identifying field (id, akid, email) and custom user field columns. If other columns are present, we'll downgrade to a regular upgrade.

You must send UTF-8 encoded Unicode data in the uploaded file.

We recommend compressing the file to reduce upload times, but even if you don't compress the file, you should treat your upload file as binary data.

Example

import requests
import sys

page = 'my_previously_created_import_page'
url  = 'https://docs.actionkit.com/rest/v1/upload/'

upload_file = sys.argv[1]

r = requests.post(url,
                  files={'upload': open(upload_file, 'rb')},
                  data={ 'page': page, 'autocreate_user_fields': 'true' },
                  auth=('USER', 'PASS'))

print r.status_code
print r.headers['Location']

Poll For Progress

On success, your initial POST will return a 201 CREATED response. The Location header will point to your new upload.

Example Response

HTTP/1.1 201 CREATED
Server: openresty
Date: Tue, 03 Feb 2015 14:44:43 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Machine-Id: dev.actionkit.com
Vary: Accept,Cookie,Accept-Encoding,User-Agent
Location: https://docs.actionkit.com/rest/v1/upload/47/
Set-Cookie: sid=4kcpskldoek31gxw0c7v4yojt43v6rj0; expires=Tue, 03-Feb-2015 22:44:43 GMT; httponly; Max-Age=28800; Path=/

Other possible response status codes are:

STATUS CODE LIKELY REASON
400 BAD REQUEST. The response body should contain more detail.
500 INTERNAL SERVER ERROR. Contact ActionKit support.
404 NOT FOUND. The page parameter doesn't not refer to a valid, non-hidden page.
401 UNAUTHORIZED. The credentials were invalid.
403 FORBIDDEN. This user does not have permission to perform this action.

When you upload a file, it's added to a processing queue. You'll need to poll the Upload to see it's current status. That's easy enough, just GET the Location you got back from the upload until the field is_completed is a true value.

We'll set is_completed to true if the upload completes without error, but also if there's a problem reading or unpacking the file, if the header is invalid, if there are too many errors to continue, or if you or someone else stops the upload.

Once an upload finishes processing you must check has_errors and has_warnings. We'll cover that in more detail in the section below, Review errors and warnings.

Note that if you're seeing dropped connections rather than a response status code, this may be due to incorrect authorization credentials. Upload attempts that fail to auth are disconnected for security reasons. You can check your auth credentials by loading any simple API request with them.

Pseudo-code Polling For Completion

while not upload['is_completed']:
    upload   = requests.get(upload_uri)
    progress = upload['progress']
    print "%s/s, remaining %ss, %s ok, %s failed, %s warned" % (
            progress['rate'],
            progress['time_remaining'],
            progress['rows']['ok'],
            progress['rows']['failed'],
            progress['rows']['warned'])
    time.sleep(1)

Response Field Reference

Field name Description
id Unique identifier for this upload
resource_uri Uri of this upload
line_count Approximate line count of uploaded file
path Internal path to the uploaded file on the ActionKit cluster
autocreate_user_fields Boolean indicating if the upload should autocreate user fields in the file
compression Boolean indicating if the uploaded file was compressed
format Detected format of the uploaded file: 'tsv' or 'csv'
page Resource_uri of the ImportPage used to process actions
created_at Timestamp when upload was created
updated_at Timestamp when upload was last updated
started_at Timestamp when upload started processing
finished_at Timestamp when upload finished processing
has_errors Count of errors found during processing
errors URI of the full list of UploadErrors
has_warnings Count of warnings found during processing
warnings URI of the full list of UploadWarnings
is_completed Boolean indicating if the processing is done, whether it was successful or not
original_header Header parsed out of the file
override_header Corrected header, provided by admin or API
progress Dictionary with details of processing progress, see below for details
status Current status of upload, see below for possible values
submitter URI of user who submitted this Upload for Processing
stop URI to stop the upload as soon as possible

Progress updates:

Field name Description
rate Rows per second
time_remaining Estimated seconds remaining to finish processing
rows Dictionary of total processing counts for 'failed', 'ok' and 'warned' rows
all URI of all previous progress reports

Possible status values:

Status Description
new New Upload
downloading Downloading file from S3 (not relevant for API uploads)
unpacking Unpacking files or archive
checking Checking Header
header_failed Header Failed Check
header_ok Header OK
loading Loading Data
died Died
stopped Stopped
completed Completed

Review Errors And Warnings

The returned JSON object from the Upload resource includes a count of errors and a count of warnings, as well as links to the full lists of errors and warnings. (Both URIs are simply pointers to resources filtered by the relationship with this upload.)

For example, you will see something like this in the returned JSON.

"errors": "/rest/v1/uploaderror/?upload=47",
"warnings": "/rest/v1/uploadwarning/?upload=47"

You can use those URIs to page through the (potentially very numerous!!) errors and warnings.

Warnings and errors will have useful information even if the uploaded file failed to start processing. Problems with the format, headers and encodings will all be stored in these resources.

Be sure to include a check for errors and warnings in your integration regardless of the status of the upload.

Stop An Upload In Progress

Large uploads may take a long time to process. We've provided a stop function for when you know that something is incorrect and you'll need to redo the upload.

POSTing to the URI: /rest/v1/upload/{{ upload.id }}/stop/ will stop the Upload.

This URI is included in the returned JSON for an upload that is being processed. Stopping a processing upload may take several seconds, you should keep polling the status if you wish to verify that it was stopped.

The stop endpoint will return a 202 ACCEPTED if the upload was stopped, and a 404 NOT FOUND if the upload id in the URI was not found.

See the next section, Override the header and restart the Upload, for how to restart an upload.

Override The Header And Restart The Upload

If your upload can't complete, or has many warnings due to a problem with the header, you can PATCH the Upload with a JSON encoded override_header and restart the upload. Restarting the processing will delete the errors, warnings and progress records from the previous processing run.

Restarting will not undo the previous upload. It will simply re-run the Upload, using any changes you've made to the override_header.

The override_header allows you to rename columns, including using the magical prefix "skip_column" to tell ActionKit to ignore a column.

Your must send valid JSON, possibly inside a JSON encoded request, so you'll need to be careful about the escaping of the value. We won't validate the override_header field until you try to re-run the upload.

Let's say you have an Upload with an original_header with two columns, "email" and "user_color". You need to change "email" to "email", so it's a valid identifier for users. And let's say you want to ignore the column "user_color" by renaming it to "skip_column_user_color".

The original_header field is JSON encoded list of field names:

"original_header": "[\"email\", \"user_color\"]",

So you're going to PATCH the upload with a modified list. Note the escaping of JSON within JSON.

"override_header": "[\"email\", \"skip_column_user_color\"]",

The PATCH request returns 202 ACCEPTED:

$ curl -X PATCH -uuser:password https://docs.actionkit.com/rest/v1/upload/50/ \
    -d'{ "override_header": "[\"email\", \"skip_column_user_color\"]" }' \
    -H'Content-type: application/json' -i
HTTP/1.1 202 ACCEPTED
Server: openresty
Date: Wed, 04 Feb 2015 10:50:16 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Machine-Id: dev.actionkit.com
Vary: Accept,Cookie,Accept-Encoding,User-Agent
Set-Cookie: sid=t586psott5901yocc71mf2d86ljgra7e; expires=Wed, 04-Feb-2015 18:50:16 GMT; httponly; Max-Age=28800; Path=/

Now you need to tell ActionKit to restart the upload by POSTing to the restart link in the Upload resource. It will look something like /rest/v1/upload/50/restart/ and will be included in an Upload resource if is_completed = True.

$ curl -X POST -uuser:password -H'Content-type: application/json' \
    -i https://docs.actionkit.com:8807/rest/v1/upload/50/restart/
HTTP/1.1 202 ACCEPTED
Server: openresty
Date: Wed, 04 Feb 2015 11:07:18 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
X-Machine-Id: dev.actionkit.com
Vary: Accept,Cookie,Accept-Encoding,User-Agent
Set-Cookie: sid=g89koa7nmjwnillwslvbypung5hz4xsz; expires=Wed, 04-Feb-2015 19:07:18 GMT; httponly; Max-Age=28800; Path=/

$ curl -uuser:passowrd -X GET \
    https://docs.actionkit.com/rest/v1/upload/50/ \
    | python -mjson.tool
{
"autocreate_user_fields": false,
"compression": "none",
"created_at": "2015-02-03T17:01:33",
"errors": "/rest/v1/uploaderror/?upload=50",
"finished_at": "2015-02-03T17:02:23",
"format": "csv",
"has_errors": 0,
"has_warnings": 0,
"id": 50,
"is_completed": false,
"line_count": 100001,
"original_header": "[\"email\", \"user_color\"]",
"override_header": "[\"email\", \"skip_column_user_color\"]",
"page": "/rest/v1/importpage/13/",
"path": "dev.actionkit.com:upload-4e9faf14-abc6-11e4-8f48-00163e0e21b4.tsv.gz",
"progress": {
    "all": "/rest/v1/uploadprogress/?upload=50",
    "rate": 0,
    "rows": {
        "failed": 0,
        "ok": 0,
        "warned": 0
    },
    "time_remaining": null
},
"resource_uri": "/rest/v1/upload/50/",
"started_at": "2015-02-03T17:01:38",
"status": "new",
"stop": "/rest/v1/upload/50/stop/",
"submitter": "/rest/v1/user/1/",
"updated_at": "2015-02-04T11:07:18",
"warnings": "/rest/v1/uploadwarning/?upload=50"
}

Note that the status is "new" , that errors and warnings have been cleared out, and is_completed is once again False. You can now recommence polling for progress and errors!

A More Complete Example

Here is an example in Python using the requests library.

import requests

from requests_toolbelt import MultipartEncoderMonitor
import requests
import sys
import time

def progressing(monitor):
    sys.stdout.write(".")

def authorization():
    return {'auth': ('username','password')}

def poll(upload_uri):
    response = requests.get(upload_uri, **authorization())
    if response.status_code != 200:
        raise Exception("Unexpected response code: %s: %s" % (response.status_code, response.content))
    return response.json()

def do_upload(page, url):
    m = MultipartEncoderMonitor.from_fields(
        fields={
            'page'  : page,
            'autocreate_user_fields': 'true',
            'upload': ('bigger.tsv.gz', open('bigger.tsv.gz', 'rb'), 'text/gzip')
        },
        callback=progressing
    )

    sys.stdout.write("\nStarting upload request: ")

    r = requests.post(url,
                      data=m,
                      headers={'Content-Type': m.content_type},
                      **authorization()
                      )

    if r.status_code != 201:
        raise Exception(r.content)

    upload_uri = r.headers['location']

    sys.stdout.write(" uploaded!\n")
    print "Polling for results @ %s." % (upload_uri)

    upload = poll(upload_uri)

    try:
        while not upload['is_completed']:
            upload = poll(upload_uri)
            print upload['status']
            progress = upload['progress']
            print "   rate %s/s, remaining %ss, %s ok, %s failed, %s warned" % (
                    progress['rate'],
                    progress['time_remaining'],
                    progress['rows']['ok'],
                    progress['rows']['failed'],
                    progress['rows']['warned'])
            time.sleep(1)

    except KeyboardInterrupt:
        print "Caught interrupt! Stopping upload!"

        requests.post(upload_uri + 'stop/', **authorization())

        upload = poll(upload_uri)
        while upload['status'] != 'stopped':
            upload = poll(upload_uri)
            print "status: %s, waiting for stop" % (upload['status'])
            time.sleep(1)

    print "Done!"

    if upload['has_errors']:
        print "Errors: %s" % (upload['errors'])
    if upload['has_warnings']:
        print "Warnings: %s" % (upload['warnings'])

# Page.name of a previously created ImportPage
page = 'my_previously_created_import_page'

# The full URI of the Upload resource endpoint
url  = 'https://docs.actionkit.com/rest/v1/upload/'

do_upload(page, url)