Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
*Decompile All the Things* - IDA Batch Decompile plugin and script for Hex-Ray's IDA Pro that adds the ability to batch decompile multiple files and their imports with additional annotations (xref, stack var size) to the pseudocode .c file
Simply does the tedious, repetitive operations for all rows of an excel filse step by step and reports after the job is done. For example it can download URL(s) in a column of an Excel file. If a new filename is provided at column B it will rename the file before saving.
Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.
This library for Android will take any set of events and batch them up before sending it to the server. It also supports persisting the events on disk so that no event gets lost because of an app crash. Typically used for developing any in-house analytics sdk where you have to make a single api call to push events to the server but you want to optimize the calls so that the api call happens only once per x events, or say once per x minutes. It also supports exponential backoff in case of network failures
For example, given a simple pipeline such as:
I'd like
aggregator
to be something requiring a non-serialisable dependency to do its work.I know I can do this: