Apache open source software list

This list of Apache Software Foundation projects contains the software development projects of the Apache Software Foundation (ASF).[1]

Besides the projects, there are a few other distinct areas of Apache:

  • Incubator: for aspiring ASF projects
  • Attic: for retired ASF projects
  • INFRA – Apache Infrastructure Team: provides and manages all infrastructure and services for the Apache Software Foundation, and for each project at the Foundation

Active projects




Incubating projects




  • Annotator: provides annotation enabling code for browsers, servers, and humans
  • BRPC: industrial-grade RPC framework for building reliable and high-performance services
  • DataLab: platform for creating self-service, exploratory data science environments in the cloud using best-of-breed data science tools
  • DevLake: development data platform, providing the data infrastructure for developer teams to analyze and improve their engineering productivity
  • EventMesh: dynamic cloud-native basic service runtime used to decouple the application and middleware layer
  • Flagon: software tool usability testing platform
  • Heron: real-time, distributed, fault-tolerant stream processing engine
  • HugeGraph: a large-scale and easy-to-use graph database
  • Kvrocks: a distributed key-value NoSQL database, supporting the rich data structure
  • Kyuubi: a distributed multi-tenant Thrift JDBC/ODBC server for large-scale data management, processing, and analytics, built on top of Apache Spark and designed to support more engines
  • Liminal: an end-to-end platform for data engineers and scientists, allowing them to build, train and deploy machine learning models in a robust and agile way
  • Linkis: a computation middleware project, which decouples the upper applications and the underlying data engines, provides standardized interfaces (REST, JDBC, WebSocket etc.) to easily connect to various underlying engines (Spark, Presto, Flink, etc.)
  • Livy: web service that exposes a REST interface for managing long-running Spark contexts
  • Marvin-AI: open-source artificial intelligence platform
  • Milagro: core security infrastructure for decentralized networks
  • Nemo: data processing system
  • NLPCraft: Java API for NLU applications
  • NuttX: mature, real-time embedded operating system (RTOS)
  • PageSpeed: series of open source technologies to help make the web faster by rewriting web pages to reduce latency and bandwidth
  • Pegasus: distributed key-value storage system which is designed to be simple, horizontally scalable, strongly consistent and high-performance
  • Pony Mail: mail-archiving, archive viewing, and interaction service
  • SDAP: integrated data analytic center for Big Science problems
  • SeaTunnel: a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data
  • Sedona: big geospatial data processing engine
  • Spot: platform for network telemetry built on an open data model and Hadoop
  • StreamPark: a streaming application development platform
  • StreamPipes: self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore (Industrial) IoT data streams
  • Teaclave: universal secure computing platform
  • Toree: provides applications with a mechanism to interactively and remotely access Spark
  • Training: project aims to develop resources which can be used for training purposes in various media formats, languages and for various Apache and non-Apache target projects
  • Tuweni: set of libraries and other tools to aid development of blockchain and other decentralized software in Java and other JVM languages
  • Uniffle: an unified Remote Shuffle Service
  • Wayang: cross-platform data processing system

The above may be incomplete, as the list of incubating project changes frequently.

Retired projects




A retired project is one which has been closed down on the initiative of the board, the project its PMC, the PPMC or the IPMC for various reasons. It is no longer developed at the Apache Software Foundation and does not have any other duties.





The ASF develops, shepherds, and incubates hundreds of freely-available, enterprise-grade projects that serve as thebackbone for some of the most visible and widely used applications in computing today. Through the ASF’s merit-basedprocess known as “The Apache Way,” more than 840 individual volunteer Members and 8,200+ codeCommitters across six continents successfully collaborate on innovationsin Artificial Intelligence and Deep Learning, Big Data, Build Management, Cloud Computing, Content Management, DevOps,IoT and Edge Computing, Mobile, Servers, and Web Frameworks, among other categories.


OPEN: The Apache Software Foundation provides support for 350+ Apache Projects and their Communities,furthering its mission of providing Open Source software for the public good.

INNOVATION: Apache Projects are defined by collaborative, consensus-based processes, an open,pragmatic software license and a desire to create high quality software that leads the way in its field.

COMMUNITY: We are a community of developers and users of enterprise-grade,Open Source Apache projects used in every Internet-connected country on the planet.

As the world’s largest and one of the most influential open source foundations, the Apache Software Foundation (ASF) is home to more than 350 community-led projects and initiatives. The ASF’s 731 individual members and more than 7,000 committers are global, diverse, and community-driven.

The ASF was founded on March 26, 1999, and to celebrate its 20th anniversary, applaud its all-volunteer community for their Herculean efforts, and thank the billions of users who make the projects under the ASF umbrella successful, we’ve assembled the following list of 20 ubiquitous or up-and-coming Apache projects.

1. Apache HTTP Server: Web/servers

Apache HTTP Server, the most popular open source HTTP server on the planet, shot to fame just 13 months after its inception in 1995. It remains prevalent today because it provides a secure, efficient, and extensible server that delivers HTTP services, according to the latest HTTP standards, for modern operating systems, including Unix, Microsoft Windows, and MacOS,

The Apache HTTP Server played a key role in the early growth of the World Wide Web; its rapid adoption over all other web servers combined was also instrumental in the wide proliferation of e-commerce sites and solutions. The Apache HTTP Server project was the ASF’s flagship project at its launch, and its open, community-driven, meritocratic development process, known as the “Apache Way,” has been emulated by all subsequent Apache projects.

2. Apache Incubator: Innovation

Apache Incubator is the ASF’s nexus for innovation, serving as the entry path for projects and codebases hoping to become part of the ASF’s official efforts. All code donations from external organizations and existing projects go through the incubation process to ensure they comply with the ASF’s legal standards and develop diverse communities that adhere to the ASF’s guiding principles.

Incubation is required of newly accepted projects until their infrastructure, communications, and decision-making process have stabilized in a manner consistent with other successful ASF projects. While incubation is neither a reflection of the completeness or stability of the code nor an indication whether the project has been fully endorsed by the ASF, its rigorous process of mentoring projects and their communities according to the Apache Way has graduated nearly 200 projects in the Incubator’s 16-year history. Today 51 “podlings” are undergoing development in the Apache Incubator across an array of categories, including annotation, artificial intelligence, big data, cryptography, data science/storage/visualization, development environments, edge computing, Internet of Things (IoT), email, JavaEE, libraries, machine learning, and serverless computing.

3. Apache Kafka: Big data

The Apache footprint as the foundation of the big data ecosystem continues to grow with 50 active projects, from Accumulo to Hadoop to ZooKeeper, and two dozen more in the Apache Incubator. Apache Kafka’s highly performant, distributed, fault-tolerant, real-time publish-subscribe messaging platform powers big data solutions at Airbnb, LinkedIn, MailChimp, Netflix, the New York Times, Oracle, PayPal, Pinterest, Spotify, Twitter, Uber, Wikimedia Foundation, and countless other businesses.

4. Apache Maven: Build management

Spinning out of the Apache Turbine servlet framework project in 2004, Apache Maven has risen to the top as the hugely popular build automation tool that helps Java developers build and release software. Stable, flexible, and feature-rich, Maven streamlines continuous builds, integration, testing, and delivery processes with an impressive central repository and robust plugin ecosystem, making it the go-to choice for developers who want to easily manage a project’s build, reporting, and documentation.

5. Apache CloudStack: Cloud

Super-quick to deploy, well-documented, with an easy production environment, one of Apache CloudStack’s biggest draws is that it “just works.” Powering some of the industry’s most visible clouds—from global hosting providers to telcos to the Fortune 100’s top 5% and more—the CloudStack community is cohesive, agile, and focused, leveraging 11 years of cloud success to enable users to rapidly and affordably build fully featured clouds.

6. Apache cTAKES: Content

Developed in real-world use at the Mayo Clinic in 2006, cTAKES was created by a team of physicians, computer scientists, and software engineers seeking a natural language processing system for extracting information from electronic medical records’ clinical free-text. Today, Apache cTAKES is an integral part of the Mayo Clinic’s electronic medical records and has processed more than 80 million clinical notes. Apache cTAKES is a growing standard for clinical data management infrastructure across hospitals and academic institutions including Boston Children’s Hospital, Cincinnati Children’s Hospital, Massachusetts Institute of Technology, University of Colorado Boulder, University of Pittsburgh, and University of California San Diego, and companies such as Wired Informatics.

7. Apache Ignite: Data management

Apache Ignite is used for transactional, analytical, and streaming workloads at petabyte scale for the likes of American Airlines, ING, Yahoo Japan, and countless others on-premises, on cloud platforms, or in hybrid environments. Apache Ignite’s in-memory data fabric provides an in-memory data grid, compute grid, streaming, and acceleration solutions across the Apache big data system ecosystem, including Apache Cassandra, Apache Hadoop, Apache Spark, and more.

8. Apache CouchDB: Database

Thousands of organizations, such as the BBC, GrubHub, and the Large Hadron Collider, use Apache CouchDB for seamless data flow between every imaginable computing environment, from globally distributed server clusters to mobile devices to web browsers. Its Couch Replication Protocol allows you to store, retrieve, and replicate data safely on-premises or in the cloud with very high performance and reliability. Apache CouchDB does all the heavy lifting so you can sit back and relax.

9. Apache Edgent (incubating): Edge computing

The boom of IoT—with personal assistants, smartphones, smart homes, connected cars, Industry 4.0, and beyond—is producing an ever-growing amount of data streaming from millions of systems, sensors, equipment, vehicles, and more. The demand for reliable, efficient real-time data has driven the need for the “empowered edge,” where data collection and analysis are optimized by moving away from centralized sources towards the edges of the networks where much of the data originates. Companies like IBM and SAP are leveraging Apache Edgent to accelerate analytics at the edge across the IoT ecosystem. Apache Edgent can be used in conjunction with many Apache data analytics solutions such as Apache Flink, Apache Kafka, Apache Samza, Apache Spark, Apache Storm, and more.

10. Apache OFBiz: Enterprise resource planning

Whereas most ASF projects are about running or creating infrastructure, the foundation recognizes the importance of running and handling a business. Apache OFBiz is a comprehensive suite of business applications to help manage everything from accounting and CRM through warehousing and inventory control. The Java-based framework provides the power and the flexibility to serve as the core of B2B and B2C business management and is easily expandable and customizable. Apache OFBiz is a complete ERP solution—flexible, free, and fully open source—and services users from United Airlines to Cabi.

11. Apache Spatial Information System (SIS): Geospatial

The US National Oceanic and Atmospheric Administration, Vietnamese National Space Center, and numerous spatial agencies, governments, and others rely on Apache SIS to create intelligent, standards-based, interoperable geospatial applications. The Apache SIS toolkit handles spatial data, location awareness, and geospatial data representation and provides a unified metadata model for file formats used for real-time smart city visualization, geospatial dataset discovery, state-of-the-art location-enabled emergency management, earth observation, and information modeling for extraterrestrial bodies such as Mars and asteroids.

12. Apache Syncope: Identity management

Apache Syncope manages digital identity data in enterprise applications and environments to handle user information such as username, password, first name, last name, email address, etc. Identity management involves user attributes, roles, resources, and entitlements that control who has access to what data, when, how, and why. Apache Syncope users include the Italian Army, the University of Helsinki, University of Milan, and the Swiss SWITCH university network.

13. Apache PLC4X (incubating): IoT

Connectivity and integration across many Industrial IoT edge gateways are often impossible with closed-source, proprietary legacy systems that have incompatible protocols. Apache PLC4X provides a universal protocol adapter for creating Industrial IoT applications through a set of libraries that allow unified access to any type of industrial programmable logic controllers (PLCs) using a variety of protocols with a shared API. In addition, the project is planning integrations modular to Apache IoT projects that include Apache Brooklyn, Apache Camel, Edgent, Apache Kafka, Apache Mynewt, and Apache NiFi.

14. Apache Commons: Libraries

With 42% or more of Apache projects written in Java (that’s 62+ million lines of code!), it’s both helpful and necessary to have a set of stable, reusable open source Java software components available to all Apache projects and external users. Apache Commons provides a suite of dozens of stable, reusable, easily deployed Java components and a workspace for Commons contributors to collaborate on the development of new components.

15. Apache Spark: Machine learning

Big data is growing exponentially each year, accelerated by industries such as agriculture, big business, fintech, healthcare, IoT, manufacturing, mobile advertising, and more. Apache Spark’s unified analytics engine for processing and analyzing large-scale data helps data scientists apply machine learning insights and an array of libraries to improve responsiveness and produce more accurate results. Apache Spark runs workloads 100x faster on Apache Hadoop, Apache Mesos, and Kubernetes (whether standalone or in the cloud), and enables them to access diverse data sources, including Apache Cassandra, Apache Hadoop HDFS, Apache HBase, Apache Hive, and hundreds of others.

16. Apache Cordova: Mobile

Apache Cordova is the popular developer tool used to easily build cross-platform, cross-device mobile apps using a “write-once-run-anywhere” solution, which enables developers to create a single app that appears the same across multiple mobile device platforms. Apache Cordova acts as an extensible container and serves as the base that most mobile application development tools and frameworks are built upon, including mobile development platforms and commercial software products by BlackBerry, Google, IBM, Intel, Microsoft, Oracle, Salesforce, and many others.

17. Apache Tomcat: Java/servers

Starting off as the Apache JServ project designed to allow for Java “servlets” to be run in a web environment, Tomcat grew to become a full-fledged, comprehensive Java application server and was the de-facto reference implementation for the Java specifications. Since 2005, Apache Tomcat has formed the foundation of numerous Java-based web infrastructures such as eBay, E-Trade, Walmart, and the Weather Channel.

18. Apache Lucene Solr: Search

Adobe, AOL, Apple, AT&T, Bank of America, Bloomberg, Cisco, Disney, E-Trade, Ford, The Guardian, the Department of Homeland Security, Instagram, MTV Networks, NASA Planetary Data System, Netflix, SourceForge, Verizon, Walmart, Whitehouse.gov, Zappos, and countless others turn to Apache Lucene Solr to quickly and reliably index and search multiple sites and enterprise data such as documents and email. Popular features include near-real-time indexing, automated failover and recovery, rich document parsing and indexing, user-extensible caching, design for high-volume traffic, and much more.

19. Apache Wicket: Web framework

Many followers prize the Apache Wicket component-based web application framework for its “plain old Java object” (POJO) data model and markup/logic separation not common in most frameworks. Developers have been using Apache Wicket since 2004 to quickly create powerful, reusable components using object-oriented methodology with Java and HTML. Wicket powers thousands of applications and sites for governments, stores, universities, cities, banks, email providers, and more, including Apress, DHL, SAP, Vodafone, and Xbox.com.

20. Apache Daffodil (incubating): XML

Governments handle massive amounts of complex and legacy data across security boundaries every day. For such data to be consumed, it must be inspected for correctness and sanitized of malicious data. While traditional inspection methods are often proprietary, incomplete, and poorly maintained, Apache Daffodil streamlines the process with an open source implementation of the Data Format Description Language specification (DFDL) that fully describes a wide array of complex and legacy file formats down to the bit level. Daffodil can parse data to XML or JSON to allow for validation, sanitization, and transformation and also serialize or ”unparse” back to the original file format, effectively mitigating a large variety of common vulnerabilities.

Looking to the future

The Apache Software Foundation is a leader in community-driven open source software and continues to innovate with dozens of new projects and their communities. Apache projects are managing exabytes of data, executing teraflops of operations, and storing billions of objects in virtually every industry. Apache software is an integral part of nearly every end-user computing device, from laptops to tablets to phones. The commercially friendly and permissive Apache License v2.0 has become an open source industry standard.

As the demand for quality open source software continues to grow, the collective Apache community will continue to rise to the challenge of solving current problems and ideate tomorrow’s opportunities through the Apache Way of open development.

Written by Jane