What it is, advantages and initial tutorial
Posted: Thu Dec 05, 2024 5:46 am
Created in 2005 by Travis Oliphant, the NumPy project was based on the Numeric and Numarray projects with the aim of bringing the community together around a single array processing framework. Therefore, the NumPy package, named after the abbreviation of Numerical Python, is an open source library designed to perform operations on multidimensional arrays, amicably referred to as ndarray in this library.
Because its construction and functionalities are based on the ndarray data structure, the library offers fast operations for data processing and cleaning, generation of subsets and filtering, descriptive statistics, manipulation of relational data,saudi arabia number data manipulation of data in groups, among other types of processing.
In addition to the features more focused on data analysis applications, in NumPy Python you will also find mathematical functions for fast operations on arrays, without the need to write loops, linear algebra features, random number generation, Fourier transforms, tools for working with memory-mapped data, as well as an API for connecting NumPy Python to libraries written in C, C++ and FORTRAN.
Although the NumPy Python library does not provide scientific or modeling features, understanding this package and its main structure, the array, is essential to more effectively use tools and libraries that are based on arrays, such as the Pandas library .
What are arrays?
Well, by now you can see that the NumPy Python package is based on a very important core structure. But what are these arrays?
An array is a multidimensional structure that allows us to store data in our computer's memory, so that each item located in this structure can be found through an indexing scheme. NumPy Python calls this structure ndarray, as an abbreviation for N-dimensional array.
The ndarray always stores the elements in the same format, which is why it is known as a homogeneous data structure, in the dimensions defined by the application or the developer. The dimensions in the NumPy library are known as axes.
A one-dimensional ndarray, that is, one that has one axis (one dimension), is very similar to Python's own list structure. We can also relate this ndarray to a column or row of a spreadsheet. The two-dimensional ndarray structure, which has two axes, is a matrix that we can relate to a spreadsheet of data, with columns and rows.
The three-dimensional structure, that is, three axes, is a structure that stores three-dimensional information. We can associate that in each position of this structure
we store information about height, width and depth (x, y and z). This type of structuring is very common when we work with images, for example, which have a pixel height, width and depth related to the RGB color channels.
What are the advantages of using NumPy Python?
In the previous section, we saw the main structure of NumPy, but you, as a Python developer, may be wondering why you shouldn't use the language's native structures to store data. The answer to this question is that NumPy was designed to be efficient with arrays, so that:
⦁ They take up less memory: Data in NumPy is stored in a continuous block of memory, unlike other Python objects. This means that the library can access and modify this data very efficiently, a concept called locality of reference in computer science. Also, since NumPy arrays are not embedded sequences, they use a smaller amount of memory;
⦁ They are faster: The operations that NumPy provides are capable of performing complex processing on data sets, without the need for loops. For this reason, applications that use NumPy are generally 10 to 100 times faster compared to applications that use Python's native structures/operations;
⦁ Easy to perform numerical calculations: The NumPy Python library provides a variety of operations to be performed on arrays. For this reason, the library is widely used in applications and libraries that require addition, subtraction, multiplication, transposition, differentiation, interpolation, among other operations with data sets, such as in the area of machine learning , image processing and mathematical routines.
Because its construction and functionalities are based on the ndarray data structure, the library offers fast operations for data processing and cleaning, generation of subsets and filtering, descriptive statistics, manipulation of relational data,saudi arabia number data manipulation of data in groups, among other types of processing.
In addition to the features more focused on data analysis applications, in NumPy Python you will also find mathematical functions for fast operations on arrays, without the need to write loops, linear algebra features, random number generation, Fourier transforms, tools for working with memory-mapped data, as well as an API for connecting NumPy Python to libraries written in C, C++ and FORTRAN.
Although the NumPy Python library does not provide scientific or modeling features, understanding this package and its main structure, the array, is essential to more effectively use tools and libraries that are based on arrays, such as the Pandas library .
What are arrays?
Well, by now you can see that the NumPy Python package is based on a very important core structure. But what are these arrays?
An array is a multidimensional structure that allows us to store data in our computer's memory, so that each item located in this structure can be found through an indexing scheme. NumPy Python calls this structure ndarray, as an abbreviation for N-dimensional array.
The ndarray always stores the elements in the same format, which is why it is known as a homogeneous data structure, in the dimensions defined by the application or the developer. The dimensions in the NumPy library are known as axes.
A one-dimensional ndarray, that is, one that has one axis (one dimension), is very similar to Python's own list structure. We can also relate this ndarray to a column or row of a spreadsheet. The two-dimensional ndarray structure, which has two axes, is a matrix that we can relate to a spreadsheet of data, with columns and rows.
The three-dimensional structure, that is, three axes, is a structure that stores three-dimensional information. We can associate that in each position of this structure
we store information about height, width and depth (x, y and z). This type of structuring is very common when we work with images, for example, which have a pixel height, width and depth related to the RGB color channels.
What are the advantages of using NumPy Python?
In the previous section, we saw the main structure of NumPy, but you, as a Python developer, may be wondering why you shouldn't use the language's native structures to store data. The answer to this question is that NumPy was designed to be efficient with arrays, so that:
⦁ They take up less memory: Data in NumPy is stored in a continuous block of memory, unlike other Python objects. This means that the library can access and modify this data very efficiently, a concept called locality of reference in computer science. Also, since NumPy arrays are not embedded sequences, they use a smaller amount of memory;
⦁ They are faster: The operations that NumPy provides are capable of performing complex processing on data sets, without the need for loops. For this reason, applications that use NumPy are generally 10 to 100 times faster compared to applications that use Python's native structures/operations;
⦁ Easy to perform numerical calculations: The NumPy Python library provides a variety of operations to be performed on arrays. For this reason, the library is widely used in applications and libraries that require addition, subtraction, multiplication, transposition, differentiation, interpolation, among other operations with data sets, such as in the area of machine learning , image processing and mathematical routines.