Leveraging Custom PostgreSQL Functions in EFCore: A Guide to Mapping and Usage
Table of contents:
· Overview:
∘ JSONB vs JSON in PostgreSQL
∘ EF Core and PostgreSQL
· Use-case:
∘ DB side vs Application side implementation
· Pseudo Solution
· User-defined function mapping
· Usage
· Conclusion
∘ GitHub sample project
Overview:
This article is centered around JSONB functions but the solution equally applies to other function types.
PostgreSQL is known for its robustness, reliability, and scalability, making it a popular choice for large and complex database applications. It offers advanced features such as support for JSON and other non-relational data types. JSON file support was first introduced in PostgreSQL v9.2, and with every new release, constant improvements are being made.
JSONB vs JSON in PostgreSQL:
JSONB (JSON Binary) is a specialized data type in Postgres that offers a more effective and efficient way to store and manipulate JSON data compared to the regular one.
JSONB stores JSON data in a special binary format, making indexing and querying much faster compared to the regular JSON data type. This is because the binary format allows for more efficient storage and retrieval of JSON data, particularly when dealing with large or complex JSON objects. Furthermore, JSONB also provides additional indexing options, such as indexing specific keys and etc.
On the other hand, the standard JSON data type in PostgreSQL stores JSON data as plain text without binary encoding or specialized indexing. While simpler to use, it may lead to slower query performance, particularly when dealing with larger or more complex JSON objects.
EF Core and PostgreSQL:
When integrating PostgreSQL with EF Core, developers often rely on the open-source EF Core provider for PostgreSQL — NpgSQL. However, NpgSQL has limitations, leaving many essential PostgreSQL functions untapped. For instance, while there are a few JSON functions available in NpgSQL, PostgreSQL offers more than 50 functions to work with JSON/JSONB data.
The solution to overcome the limitations lies in the mapping of PostgreSQL functions in EF Core. This enables EF Core to effectively handle PostgreSQL functions, closing the gap between the two and ensuring smooth reusability of PostgreSQL functions in EF Core queries.
Use-case:
With over 2,000,000 entities in our database table, each containing a custom user-defined “Properties” column in JSONB format like this:
[
{
"Name": "Severity",
"Value": "High"
},
{
"Name": "Index",
"Value": 40
}
{
"Name": "CreatedDate",
"Value": "7/31/2023 8:23:50 PM"
}
...
]
our task is to efficiently filter out rows that match certain filters (i.e. Severity is High). We need to consider the following points:
- Implement filtering on the database side to avoid performance issues.
- Utilize EF Core, given our application’s current use of it, as it’s intuitive to not change the ORM only for this use case. Additionally, explore the other advantages of EF Core that will be covered in the next sections.
- Aim to develop a generic solution that can be used in various scenarios and determine future use cases.
DB side vs Application side implementation:
“DB side” and “Application side” refer to different locations where data processing or manipulation can occur.
- DB side: When data processing or filtering happens directly on the Database before returning to the application (using SQL queries)
- Application side (Code side): When data processing or filtering happens on the Code side, it means that the operations are performed within the application code after retrieving the data from the database.
Let’s consider a scenario where the user applies some filters, resulting in the need to return 100 entities.
Assuming that the size of 2,000,000 entities is approximately equal to 1.2 GB, implementing filtering on the application side would require fetching and storing all the data (1.2 GB) in the application’s memory. Additionally, we would need to set up a data structure to handle the filtering and ordering, which will increase memory usage. This means that to handle 2,000,000 entities, we would require both 1.2 GB of available RAM and sufficient CPU resources in both the database and application servers.
Furthermore, transferring 1.2 GB of data over the internet (from the database to the application) is a resource-intensive operation. Assuming we have a stable and high-speed network connection (125 MB/s), it will take ~ 10 seconds to get data streamed/returned over the internet.
Another point is that PostgreSQL, the database used in our case, already provides built-in mechanisms for storing and processing JSON data. Additionally, it supports indexing, which enhances the speed of data filtering, enabling faster retrieval of the desired data.
Considering that this is the case for each operation/request, supporting multiple users at once would significantly increase the demands on the system’s resources.
In summary, implementing filtering on the application side can lead to substantial memory and CPU usage, especially with large datasets. The data transfer over the internet, data structures, and speed of filtering can also become a performance bottleneck, impacting response times and scalability. To optimize performance and resource utilization, filtering directly on the DB side using SQL queries is a more efficient approach, as it minimizes data transfer and reduces the burden on the application server’s memory and CPU.
Pseudo Solution:
Given a user-given filter, the objective is to retrieve all entities that have a “Severity” property with a value equal to “High”
To tackle this, let’s break the problem down into smaller steps.
The first step involves understanding the SQL query that needs to be executed to achieve this use case.
SELECT *
FROM public."Entities" as e
WHERE jsonb_path_query_first(e."Properties", '$[*] ? (@.Name == "Severity").Value') = 'High';
In this query, we utilize the jsonb_path_query_first
function to extract the value from JSON. The function relies on a JSON path to identify the specific value to retrieve. It retrieves the “Value
” property from all elements in the JSON array where the “Name
” property is equal to “Severity
”.
JSON path can be broken down as follows:
$[*]
: The dollar sign followed by the asterisk[*]
is a wildcard that represents all elements in the JSON array.?
: The question mark?
indicates a filter operation, specifying that we want to filter the elements based on a condition.(@.Name == "Severity")
: This is the condition used for filtering.@
refers to the current element being evaluated in the array, and@.Name
is used to access the "Name
" property of each element. The filter checks if the value of the "Name
" property is equal to "Severity
.".Value
: Finally,.Value
is used to access the "Value
" property of the elements that satisfy the condition.
It’s also possible to chain them and have more filters:
WHERE jsonb_path_query_first(e."Properties", '$[*] ? (@.Name == "Severity").Value') = 'High'
AND jsonb_path_query_first(e."Properties", '$[*] ? (@.Name == "Index").Value') > 40
...
An added benefit of this function is that it can be used also for ordering. As demonstrated in the example below, we can use it to sort the data based on the “Severity
” property value.
SELECT *
FROM public."Entities" as e
WHERE jsonb_path_query_first(e."Properties", '$[*] ? (@.Name == "Severity").Value') = 'High'
ORDER BY jsonb_path_query_first(e."Properties", '$[*] ? (@.Name == "Severity").Value');
In EF core we need to somehow reuse LINQ functions to get an equivalent SQL query.
await context.Entities.Where(x => ...).OrderBy(x => ...).ToListAsync()
While we can use LINQ functions for filtering and ordering, the absence of a built-in function jsonb_path_query_first
makes it difficult to perform the same JSON operations.
User-defined function mapping:
EF Core allows for using user-defined SQL functions in queries. To do that, the functions need to be mapped to a CLR method during model configuration. When translating the LINQ query to SQL, the user-defined function is called instead of the CLR function it has been mapped to.
In simpler terms, we must instruct the EF translator to include the PSQL function when it meets the user-defined function in the expressions.
To achieve this functionality, EF Core provides a method called HasDbFunction
, which must be called within the OnModelCreating
method in your DbContext
class.
DbContext.cs
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.HasDbFunction(typeof(CustomEFFunctions).GetMethod(nameof(CustomEFFunctions.JsonbPathQueryFirst)), builder =>
{
builder.HasParameter("json").HasStoreType("jsonb");
builder.HasTranslation(args =>
{
return new SqlFunctionExpression("jsonb_path_query_first", args, false, args.Select(x => false), typeof(string), null);
});
});
}
CustomEFFunctions.cs
public class CustomEFFunctions
{
public static string JsonbPathQueryFirst(object json, [NotParameterized] string path)
{
throw new NotSupportedException($"{nameof(JsonbPathQueryFirst)} cannot be called client side");
}
}
Here’s a breakdown of the method:
HasDbFunction
: This method is used to register a custom database function with EF Core. It takes two main arguments:
- The first argument is the reflection
MethodInfo
of the method that implements the custom function (in this case,typeof(CustomEFFunctions).GetMethod(nameof(CustomEFFunctions.JsonbPathQueryFirst)
). - The second argument is a configuration action, represented by a lambda expression, where you define the translation behavior for the custom function.
2. Configuration Action: The lambda expression within the configuration action allows you to specify how EF Core should translate the custom function into SQL when used in queries.
3. HasParameter
: This method is used to configure the parameters of the custom function. In this case, it sets the parameter "json"
to have the store type "jsonb"
, indicating that the parameter is expected to have the JSONB data type in the database.
json
: Its store type is“jsonb”
in the database provider, and the CLR type is “object” to work with all kinds of CLR objects that are stored as JSONB.path
: The attribute[NotParameterized]
is used to ensure it is inserted as the raw query in the SQL expression.
4.HasTranslation
: This method is used to define the translation of the custom function to SQL. In this example, the function name "jsonb_path_query_first"
is used, and the arguments are passed along to the SQL function accordingly.
5. The return type of this function is a string as the PSQL function returns JSONB as a result (we will need to additionally cast the result to the needed type)
Usage:
After we have successfully translated the PSQL function to EF, we can use it in LINQ expressions.
entities.Where(x =>
Convert.ToString(CustomEFFunctions.JsonbPathQueryFirst(x.Properties, "$[*] ? (@.Name == \"Severity\").Value")) == "\"High\"" &&
Convert.ToInt32(CustomEFFunctions.JsonbPathQueryFirst(x.Properties, "$[*] ? (@.Name == \"Index\").Value")) >= 1990000
);
it will be translated to this SQL query:
SELECT e."Id", e."Properties"
FROM "Entities" AS e
WHERE
(jsonb_path_query_first(e."Properties", '$[*] ? (@.Name == "Severity").Value')::text = '"High"') AND
(jsonb_path_query_first(e."Properties", '$[*] ? (@.Name == "Index").Value')::INT >= 1990000)
Another advantage of this solution is that it also can be used in other LINQ queries e.g. OrderBy(), Select() and etc.
entities.Where(x =>
Convert.ToString(CustomEFFunctions.JsonbPathQueryFirst(x.Properties, "$[*] ? (@.Name == \"Severity\").Value")) == "\"High\"" &&
Convert.ToInt32(CustomEFFunctions.JsonbPathQueryFirst(x.Properties, "$[*] ? (@.Name == \"Index\").Value")) >= 1990000
)
.OrderBy(CustomEFFunctions.JsonbPathQueryFirst(x.Properties, "$[*] ? (@.Name == \"Index\").Value"));
SELECT e."Id", e."Properties"
FROM "Entities" AS e
WHERE
(jsonb_path_query_first(e."Properties", '$[*] ? (@.Name == "Severity").Value')::text = '"High"') AND
(jsonb_path_query_first(e."Properties", '$[*] ? (@.Name == "Index").Value')::INT >= 1990000)
ORDER BY jsonb_path_query_first(e."Properties", '$[*] ? (@.Name == "Index").Value')
Conclusion:
ORMs are frequently used to accelerate development, allowing us to accomplish more in less time. However, they come with their limitations. As our codebase evolves, optimization becomes a priority. Leveraging database functions, particularly for JSON operations, can greatly benefit users. Fortunately, the EF Core team has thoughtfully provided helper classes, making it effortless to introduce new functionality to our applications.
GitHub sample project:
To run this sample, you can access the entire project on GitHub.