devFlames
Based on the AWESOME repository.
CLICK HERE TO CHECK IT OUT
Big Data
Business
Back-End Development
Computer Science
Content Management Systems
Databases
Decentralized Systems
Development Environment
Editors
Entertainment
Testing Procedure
Front-End Development
Back-End Development
Hardware
Health & Social Science
Learn
Media
Networking
Security
BIG DATA
Flask
Docker
Vagrant
Pyramid
Play1 Framework
CakePHP
Symfony
Laravel
Rails
Phalcon
nginx
Dropwizard
Kubernetes
Lumen
Serverless Framework
Apache Wicket
Vert.x
Terraform
Vapor
Dash
FastAPI
CDK
IAM
Slim
Fiber
NewSQL Databases
Actian Ingres – commercially supported, open-source SQL relational database management system.
ActorDB – a distributed SQL database with the scalability of a KV store, while keeping the query capabilities of a relational database.
Amazon RedShift – data warehouse service, based on PostgreSQL.
BayesDB – statistic oriented SQL database.
Bedrock – a simple, modular, networked and distributed transaction layer built atop SQLite.
CitusDB – scales out PostgreSQL through sharding and replication.
Cockroach – Scalable, Geo-Replicated, Transactional Datastore.
Comdb2 – a clustered RDBMS built on optimistic concurrency control techniques.
Datomic – distributed database designed to enable scalable, flexible and intelligent applications.
FoundationDB – distributed database, inspired by F1.
Google F1 – distributed SQL database built on Spanner.
Google Spanner – globally distributed semi-relational database.
H-Store – is an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications.
Haeinsa – linearly scalable multi-row, multi-table transaction library for HBase based on Percolator.
HandlerSocket – NoSQL plugin for MySQL/MariaDB.
InfiniSQL – infinity scalable RDBMS.
KarelDB – a relational database backed by Apache Kafka.
Map-D – GPU in-memory database, big data analysis and visualization platform.
MemSQL – in memory SQL database witho optimized columnar storage on flash.
NuoDB – SQL/ACID compliant distributed database.
Oracle TimesTen in-Memory Database – in-memory, relational database management system with persistence and recoverability.
Pivotal GemFire XD – Low-latency, in-memory, distributed SQL data store. Provides SQL interface to in-memory table data, persistable in HDFS.
SAP HANA – is an in-memory, column-oriented, relational database management system.
SenseiDB – distributed, realtime, semi-structured database.
Sky – database used for flexible, high performance analysis of behavioral data.
SymmetricDS – open source software for both file and database synchronization.
TiDB – TiDB is a distributed SQL database. Inspired by the design of Google F1.
VoltDB – claims to be fastest in-memory database.
yugabyteDB – open source, high-performance, distributed SQL database compatible with PostgreSQL.
Time-Series Databases
Axibase Time Series Database – Integrated time series database on top of HBase with built-in visualization, rule-engine and SQL support.
Chronix – a time series storage built to store time series highly compressed and for fast access times.
Cube – uses MongoDB to store time series data.
Heroic – is a scalable time series database based on Cassandra and Elasticsearch.
InfluxDB – a time series database with optimised IO and queries, supports pgsql and influx wire protocols.
QuestDB – high-performance, open-source SQL database for applications in financial services, IoT, machine learning, DevOps and observability.
IronDB – scalable, general-purpose time series database.
Kairosdb – similar to OpenTSDB but allows for Cassandra.
M3DB – a distributed time series database that can be used for storing realtime metrics at long retention.
Newts – a time series database based on Apache Cassandra.
TDengine – a time series database in C utilizing unique features of IoT to improve read/write throughput and reduce space needed to store data
OpenTSDB – distributed time series database on top of HBase.
Prometheus – a time series database and service monitoring system.
Beringei – Facebook’s in-memory time-series database.
TrailDB – an efficient tool for storing and querying series of events.
Druid – Column oriented distributed data store ideal for powering interactive applications
Riak-TS – Riak TS is the only enterprise-grade NoSQL time series database optimized specifically for IoT and Time Series data.
Akumuli Akumuli is a numeric time-series database. It can be used to capture, store and process time-series data in real-time. The word “akumuli” can be translated from esperanto as “accumulate”.
Rhombus – A time-series object store for Cassandra that handles all the complexity of building wide row indexes.
Dalmatiner – DB Fast distributed metrics database
Blueflood – A distributed system designed to ingest and process time series data
Timely – Timely is a time series database application that provides secure access to time series data based on Accumulo and Grafana.
SiriDB – Highly-scalable, robust and fast, open source time series database with cluster functionality.
Thanos – Thanos is a set of components to create a highly available metric system with unlimited storage capacity using multiple (existing) Prometheus deployments.
VictoriaMetrics – fast, scalable and resource-effective open-source TSDB compatible with Prometheus. Single-node and cluster versions included
SQL-like processing
Actian SQL for Hadoop – high performance interactive SQL access to all Hadoop data.
Apache Drill – framework for interactive analysis, inspired by Dremel.
Apache HCatalog – table and storage management layer for Hadoop.
Apache Hive – SQL-like data warehouse system for Hadoop.
Apache Calcite – framework that allows efficient translation of queries involving heterogeneous and federated data.
Apache Phoenix – SQL skin over HBase.
Aster Database – SQL-like analytic processing for MapReduce.
Cloudera Impala – framework for interactive analysis, Inspired by Dremel.
Concurrent Lingual – SQL-like query language for Cascading.
Datasalt Splout SQL – full SQL query engine for big datasets.
Dremio – an open-source, SQL-like Data-as-a-Service Platform based on Apache Arrow.
Facebook PrestoDB – distributed SQL query engine.
Google BigQuery – framework for interactive analysis, implementation of Dremel.
Materialize – is a streaming database for real-time applications using SQL for queries and supporting a large fraction of PostgreSQL.
Invantive SQL – SQL engine for online and on-premise use with integrated local data replication and 70+ connectors.
PipelineDB – an open-source relational database that runs SQL queries continuously on streams, incrementally storing results in tables.
Pivotal HDB – SQL-like data warehouse system for Hadoop.
RainstorDB – database for storing petabyte-scale volumes of structured and semi-structured data.
Spark Catalyst – is a Query Optimization Framework for Spark and Shark.
SparkSQL – Manipulating Structured Data Using Spark.
Splice Machine – a full-featured SQL-on-Hadoop RDBMS with ACID transactions.
Stinger – interactive query for Hive.
Tajo – distributed data warehouse system on Hadoop.
Trafodion – enterprise-class SQL-on-HBase solution targeting big data transactional or operational workloads.
redpanda – A Kafka® replacement for mission critical systems; 10x faster. Written in C++.
Data Ingestion
Amazon Kinesis – real-time processing of streaming data at massive scale.
Amazon Web Services Glue – serverless fully managed extract, transform, and load (ETL) service
Census – A reverse ETL product that let you sync data from your data warehouse to SaaS Applications. No engineering favors required—just SQL.
Apache Chukwa – data collection system.
Apache Flume – service to manage large amount of log data.
Apache Kafka – distributed publish-subscribe messaging system.
Apache NiFi – Apache NiFi is an integrated data logistics platform for automating the movement of data between disparate systems.
Apache Pulsar – a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.
Apache Sqoop – tool to transfer data between Hadoop and a structured datastore.
Embulk – open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services.
Facebook Scribe – streamed log data aggregator.
Fluentd – tool to collect events and logs.
Gazette – Distributed streaming infrastructure built on cloud storage which makes it easy to mix and match batch and streaming paradigms.
Google Photon – geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency.
Heka – open source stream processing software system.
HIHO – framework for connecting disparate data sources with Hadoop.
Kestrel – distributed message queue system.
LinkedIn Databus – stream of change capture events for a database.
LinkedIn Kamikaze – utility package for compressing sorted integer arrays.
LinkedIn White Elephant – log aggregator and dashboard.
Logstash – a tool for managing events and logs.
Netflix Suro – log agregattor like Storm and Samza based on Chukwa.
Pinterest Secor – is a service implementing Kafka log persistance.
Linkedin Gobblin – linkedin’s universal data ingestion framework.
Skizze – sketch data store to deal with all problems around counting and sketching using probabilistic data-structures.
StreamSets Data Collector – continuous big data ingest infrastructure with a simple to use IDE.
Alooma – data pipeline as a service enabling moving data sources such as MySQL into data warehouses.
RudderStack – an open source customer data infrastructure (segment, mParticle alternative) written in go.
Service Programming
Akka Toolkit – runtime for distributed, and fault tolerant event-driven applications on the JVM.
Apache Avro – data serialization system.
Apache Curator – Java libaries for Apache ZooKeeper.
Apache Karaf – OSGi runtime that runs on top of any OSGi framework.
Apache Thrift – framework to build binary protocols.
Apache Zookeeper – centralized service for process management.
Google Chubby – a lock service for loosely-coupled distributed systems.
Hydrosphere Mist – a service for exposing Apache Spark analytics jobs and machine learning models as realtime, batch or reactive web services.
Linkedin Norbert – cluster manager.
Mara – A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
OpenMPI – message passing framework.
Serf – decentralized solution for service discovery and orchestration.
Spotify Luigi – a Python package for building complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
Spring XD – distributed and extensible system for data ingestion, real time analytics, batch processing, and data export.
Twitter Elephant Bird – libraries for working with LZOP-compressed data.
Twitter Finagle – asynchronous network stack for the JVM.
Scheduling
Apache Airflow – a platform to programmatically author, schedule and monitor workflows.
Apache Aurora – is a service scheduler that runs on top of Apache Mesos.
Apache Falcon – data management framework.
Apache Oozie – workflow job scheduler.
Azure Data Factory – cloud-based pipeline orchestration for on-prem, cloud and HDInsight
Chronos – distributed and fault-tolerant scheduler.
Cronicle – Distributed, easy to install, NodeJS based, task scheduler
Dagster – a data orchestrator for machine learning, analytics, and ETL.
Linkedin Azkaban – batch workflow job scheduler.
Schedoscope – Scala DSL for agile scheduling of Hadoop jobs.
Sparrow – scheduling platform.
Machine Learning
Azure ML Studio – Cloud-based AzureML, R, Python Machine Learning platform
brain – Neural networks in JavaScript.
Oryx – Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning.
Concurrent Pattern – machine learning library for Cascading.
convnetjs – Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.
DataVec – A vectorization and data preprocessing library for deep learning in Java and Scala. Part of the Deeplearning4j ecosystem.
Deeplearning4j – Fast, open deep learning for the JVM (Java, Scala, Clojure). A neural network configuration layer powered by a C++ library. Uses Spark and Hadoop to train nets on multiple GPUs and CPUs.
Decider – Flexible and Extensible Machine Learning in Ruby.
ENCOG – machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data.
etcML – text classification with machine learning.
Etsy Conjecture – scalable Machine Learning in Scalding.
Feast – A feature store for the management, discovery, and access of machine learning features. Feast provides a consistent view of feature data for both model training and model serving.
GraphLab Create – A machine learning platform in Python with a broad collection of ML toolkits, data engineering, and deployment tools.
H2O – statistical, machine learning and math runtime with Hadoop. R and Python.
Karate Club – An unsupervised machine learning library for graph structured data. Python
Keras – An intuitive neural net API inspired by Torch that runs atop Theano and Tensorflow.
Lambdo – Lambdo is a workflow engine which significantly simplifies the analysis process by unifying feature engineering and machine learning operations.
Little Ball of Fur – A subsampling library for graph structured data. Python
Mahout – An Apache-backed machine learning library for Hadoop.
MLbase – distributed machine learning libraries for the BDAS stack.
MLPNeuralNet – Fast multilayer perceptron neural network library for iOS and Mac OS X.
ML Workspace – All-in-one web-based IDE specialized for machine learning and data science.
MOA – MOA performs big data stream mining in real time, and large scale machine learning.
MonkeyLearn – Text mining made easy. Extract and classify data from text.
ND4J – A matrix library for the JVM. Numpy for Java.
nupic – Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms.
PredictionIO – machine learning server buit on Hadoop, Mahout and Cascading.
PyTorch Geometric Temporal – a temporal extension library for PyTorch Geometric .
RL4J – Reinforcement learning for Java and Scala. Includes Deep-Q learning and A3C algorithms, and integrates with Open AI’s Gym. Runs in the Deeplearning4j ecosystem.
SAMOA – distributed streaming machine learning framework.
scikit-learn – scikit-learn: machine learning in Python.
Shapley – A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
Spark MLlib – a Spark implementation of some common machine learning (ML) functionality.
Sibyl – System for Large Scale Machine Learning at Google.
TensorFlow – Library from Google for machine learning using data flow graphs.
Theano – A Python-focused machine learning library supported by the University of Montreal.
Torch – A deep learning library with a Lua API, supported by NYU and Facebook.
Velox – System for serving machine learning predictions.
Vowpal Wabbit – learning system sponsored by Microsoft and Yahoo!.
WEKA – suite of machine learning software.
BidMach – CPU and GPU-accelerated Machine Learning Library.
Benchmarking
Apache Hadoop Benchmarking – micro-benchmarks for testing Hadoop performances.
Berkeley SWIM Benchmark – real-world big data workload benchmark.
Intel HiBench – a Hadoop benchmark suite.
PUMA Benchmarking – benchmark suite for MapReduce applications.
Yahoo Gridmix3 – Hadoop cluster benchmarking from Yahoo engineer team.
Security
Apache Ranger – Central security admin & fine-grained authorization for Hadoop
Apache Eagle – real time monitoring solution
Apache Knox Gateway – single point of secure access for Hadoop clusters.
Apache Sentry – security module for data stored in Hadoop.
BDA – The vulnerability detector for Hadoop and Spark
System Deployment
Apache Ambari – operational framework for Hadoop mangement.
Apache Bigtop – system deployment framework for the Hadoop ecosystem.
Apache Helix – cluster management framework.
Apache Mesos – cluster manager.
Apache Slider – is a YARN application to deploy existing distributed applications on YARN.
Apache Whirr – set of libraries for running cloud services.
Apache YARN – Cluster manager.
Brooklyn – library that simplifies application deployment and management.
Buildoop – Similar to Apache BigTop based on Groovy language.
Cloudera HUE – web application for interacting with Hadoop.
Facebook Prism – multi datacenters replication system.
Google Borg – job scheduling and monitoring system.
Google Omega – job scheduling and monitoring system.
Hortonworks HOYA – application that can deploy HBase cluster on YARN.
Kubernetes – a system for automating deployment, scaling, and management of containerized applications.
Marathon – Mesos framework for long-running services.
Linkis – Linkis helps easily connect to various back-end computation/storage engines.
Applications
411 – an web application for alert management resulting from scheduled searches into Elasticsearch.
Adobe spindle – Next-generation web analytics processing with Scala, Spark, and Parquet.
Apache Metron – a platform that integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis.
Apache Nutch – open source web crawler.
Apache OODT – capturing, processing and sharing of data for NASA’s scientific archives.
Apache Tika – content analysis toolkit.
Argus – Time series monitoring and alerting platform.
AthenaX – a streaming analytics platform that enables users to run production-quality, large scale streaming analytics using Structured Query Language (SQL).
Atlas – a backend for managing dimensional time series data.
Countly – open source mobile and web analytics platform, based on Node.js & MongoDB.
Domino – Run, scale, share, and deploy models — without any infrastructure.
Eclipse BIRT – Eclipse-based reporting system.
ElastAert – ElastAlert is a simple framework for alerting on anomalies, spikes, or other patterns of interest from data in ElasticSearch.
Eventhub – open source event analytics platform.
HASH – open source simulation and visualization platform.
Hermes – asynchronous message broker built on top of Kafka.
Hunk – Splunk analytics for Hadoop.
Imhotep – Large scale analytics platform by indeed.
Indicative – Web & mobile analytics tool, with data warehouse (AWS, BigQuery) integration.
Jupyter – Notebook and project application for interactive data science and scientific computing across all programming languages.
MADlib – data-processing library of an RDBMS to analyze data.
Kapacitor – an open source framework for processing, monitoring, and alerting on time series data.
Kylin – open source Distributed Analytics Engine from eBay.
PivotalR – R on Pivotal HD / HAWQ and PostgreSQL.
Rakam – open-source real-time custom analytics platform powered by Postgresql, Kinesis and PrestoDB.
Qubole – auto-scaling Hadoop cluster, built-in data connectors.
SnappyData – a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) built on Spark in a single integrated cluster.
Snowplow – enterprise-strength web and event analytics, powered by Hadoop, Kinesis, Redshift and Postgres.
SparkR – R frontend for Spark.
Splunk – analyzer for machine-generated data.
Sumo Logic – cloud based analyzer for machine-generated data.
Talend – unified open source environment for YARN, Hadoop, HBASE, Hive, HCatalog & Pig.