The current demand of high network speed has led NIDS to process increasing amounts of information in less time. Consequently, most part of manufacturers have opted for hardware design implementation, which in most cases increased the price of these products. The aim of this paper focus the optimization of the performance of our NIDS APAP, based on different concurrency techniques. This upgrade increases amount of traffic per unit of time that is being processed by the system without relying on a hardware implementation. It is important to clarify that despite these measures can make our NIDS perform in real time on fast networks, it cannot achieve the same performance as a hardware implementation. As the first step it is interesting to briefly highlight some of the most important features of our initial prototype of IDS, APAP , with the purpose of getting into context. This system was developed as a hybrid NIDS combining signature and anomaly based detection. The system simultaneously executes Snort along with its preprocessors and an anomaly based detector whose design is based on Anagram . We chose to work on CPU level parallelism using OpenMP libraries. These libraries provide an API that allows us to add concurrency to the application by means of shared memory parallelism. It is based on the creation of parallel execution threads that share variables from their parent process. OpenMP consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior. The first thing to take into consideration is the degree of parallelization of the algorithm. Because the optimization could be in jeopardy if the threads context changes do not take place. Therefore, we created four testing suites corresponding to four different parallelization criteria. The first suite is a total parallelization of the algorithm, the other three are relaxations of the first by means of no parallelization of: fixed loop iterations, variable loop iterations and loops iterating to a concrete variable of the code, respectively. Notice that each suite includes the relaxations made on the previous ones. Figure 1 illustrates the time it took to run the algorithm depending on the number of threads for each level respect of the execution on a single thread. This analysis was done using a Core 2 DUO CPU processor, meaning a powerful performance may be achieved using more powerful processors. The trace used for the tests was ceded by the Computer Center of the Universidad Complutense de Madrid.