Amr Saafan: June 2009

Saturday, June 13, 2009

SQL Server as a Runtime Host Part 3

There are three categories of access security for managed code. These are SAFE, EXTERNAL_ACCESS, and UNSAFE, which we mentioned previously with respect to class loading. This allows the DBA to determine if an assembly should be permitted certain privileges while knowing the risks.

These categories equate to SQL Server–specific permission sets using code access security concepts. Having stated this, there is no specific enhancement to the CLR past the normal stack walk that intercepts all privileged operations as defined by the CLR permissions model and enforces user permissions. For ensuring the integrity of userpermissions defined in the database, we depend on the principal execution context of the stored procedure or user-defined function in combination with database roles.

I have spoken of AppDomains quite a bit in previous articles. It’s time to describe exactly what they are and how SQL Server uses them. In .NET, processes can be subdivided into pieces known as application domains, or AppDomains. Loading the runtime loads a default AppDomain; user or system code can create other AppDomains.

AppDomains are like lightweight processes themselves with respect to code isolation and marshaling. This means that object instances in one AppDomain are not directly available to other AppDomains by means of memory references; the parameters must be “marshaled up” and shipped across. In .NET, the default is marshal-by-value; a copy of the instance data is made and shipped to the caller.

Another choice is marshal-by-reference, in which the caller gets alocator or “logical pointer” to the data in he callee’s AppDomain, and subsequent use of that instance involves a cross AppDomain trip. This isolates one AppDomain’s state from others. Each process that loads the .NET Framework creates a default AppDomain.

From this AppDomain, you can create additional AppDomains programmatically, like this:

public static int Main(string[] argv)
    {
        // Create domain
        AppDomain child = AppDomain.CreateDomain("dom2");

        // Execute yourapp.exe
        int entryPoint = child.ExecuteAssembly("yourapp.exe", null, argv);

        // Unload domain
        AppDomain.Unload(child);

        return entryPoint;
    }

Although there may be many AppDomains in a process, AppDomains cannot share class instances without marshaling. SQL Server does not use the default AppDomain for database processing, although it is used to load the runtime. Exactly how AppDomains are allocated in SQL Server 2005 is opaque to and not controllable by the user or DBA; however, by observation, in the beta version of SQL Server 2005, it can be determined that a separate AppDomain will be created for each database for running that database’s code.

Executing the system function master.sys.fn_appdomains() shows the AppDomains in the SQL Server process when more than one combination is in use. In the beta 1 version, the AppDomains were named “databasename.number”-for example, “AdventureWorks.2.” This effectively isolates each database’s user code from the others, albeit at the cost of more virtual memory. In beta 2, AppDomains may be allocated based on the identity of the user owning the assembly, possibly resulting in more AppDomains, but isolating each assembly owner’s code.

This effectively prevents using reflection to circumvent SQL Server permissions without the overhead of intercepting each call. The runtime-hosting APIs also support the concept of domain-neutral code. Domain-neutral code means that one copy of the Just-In-Time compiled code is shared across multiple AppDomains.
Although this reduces the working set of the process because only one copy of the code and supported structures exists in memory, it is a bit slower to access static variables, because each AppDomain must have its own copy of static variables and this requires the runtime to add a level of indirection.

There are four domain-neutral code settings:

1. No assemblies are domain neutral.
2. All assemblies are domain neutral.
3. Only strongly named assemblies are domain neutral.
4. The host can specify a list of assemblies that are domain neutral.

SQL Server 2005 uses the fourth option it will only share a set of Framework assemblies. It doesn’t share strongly named user assemblies, because it means user assemblies that happen to be strongly named can never be unloaded. AppDomains do not have a concept of thread affinity; that is, all AppDomains share the common CLR thread pool.

This means that although object instances must be marshaled across AppDomains, the marshaling is more lightweight than COM marshaling, for example, because not every marshal requires a thread switch. This also means it is possible to delegate the management of all threads to SQL Server while retaining the existing marshaling behavior with respect to threads.

Source: https://www.nilebits.com/blog/2009/09/sql-server-as-a-runtime-host/

Tuesday, June 9, 2009

SQL Server as a Runtime Host Part 2

Since in SQL Server users are not allowed to run arbitrary programs for reliability reasons, code (an assembly) is loaded a little differently than in other runtime hosts. The user or DBA must preload the code into the database and define which portions are invocable from Transact-SQL.

Preloading and defining code uses ordinary SQL Server Data Definition Language (DDL). Loading code as a stream of bytes from the database rather than from the file system makes SQL Server’s class loader unique. The class libraries that make up the .NET Framework are treated differently from ordinary user code in that they are loaded from the global assembly cache and are not defined to SQL Server or stored in SQL Server.

Some portions of the base class libraries may have no usefulness in a SQL Server environment (for example, System.Windows.Forms); some may be dangerous to the health of the service process when used incorrectly (System.Threading) or may be a security risk (portions of System.Security). The architects of SQL Server 2005 have reviewed the class libraries that make up the .NET Framework, and only those deemed relevant will be enabled for loading.

This is accomplished by providing the CLR with a list of libraries that are OK to load. SQL Server will take the responsibility for validating all user libraries, to determine that they don’t contain non-read-only static variables, for example. SQL Server does not allow sharing state between user libraries and registers through the new CLR hosting APIs for notification of all interassembly calls.

In addition, user libraries are divided into three categories by degree of danger; assemblies can be assigned to a category and use only the appropriate libraries for that category.
Because code in SQL Server must be reliable, SQL Server will only load the exact version of the Framework class libraries it supports.

This is analogous to shipping a particular tested version of ADO with SQL Server. Multiple versions of your code will be able to run side by side (though this was not enabled in beta 1), but the assemblies must be defined with different SQL Server object names.

Source: https://www.nilebits.com/blog/2009/09/sql-server-as-a-runtime-host/

Sunday, June 7, 2009

SQL Server as a Runtime Host Part 1

If you are a SQL Server developer or database administrator, you might just be inclined to use the new Common Language Runtime (CLR) hosting feature to write stored procedures in C# or VB.NET without knowing how it works. But you should care. SQL Server is an enterprise application, perhaps one of the most important in your organization. When the CLR was added to SQL Server, there were three goals in the implementation, considered in this order:

1. Security
2. Reliability
3. Performance

The reasons for this order are obvious. Without a secure system, you have a system that runs reliably run code, including code introduced by hackers, very quickly. It’s not what you’d want for an enterprise application. Reliability comes next. Critical applications, like a database management system, are expected to be available 99.99% of the time.

You don’t want to wait in a long line at the airport or the bank while the database restarts itself. Reliability is therefore considered over performance when the two clash; a decision might be whether to allow stack overflows to potentially bring down the main application, or slow down processing to make sure they don’t. Since applications that perform transactional processing use SQL Server, SQL Server must ensure data integrity and its transactional correctness, which is another facet of reliability.

Performance is extremely important in an enterprise application as well. Database management systems can be judged on benchmarks, such as the TPC-C (Transaction Processing Performance Council benchmark C) benchmark, as well as programmer-friendly features. So although having stored procedures and user-defined types written in high-level languages is a nice feature, it has to be implemented in such a way as to maximize performance.

Since SQL Server 2005 is going to introduce fundamental changes such as loading .NET runtime engines and XML parsers, we’ll first consider how SQL Server 2005 works as a .NET runtime host, how it compares with other .NET runtime hosts, and what special features of the runtime are used to ensure security, reliability, and performance.

You may already know that an updated version of the .NET runtime, .NET 2.0, will be required for use with SQL Server. In this article, I will explain why. A runtime host is defined as any process that loads the .NET runtime and runs code in a managed environment. The most common scenario is that a runtime host is simply a bootstrap program that executes from the Windows shell, loads the runtime into memory, and then loads one or more managed assemblies.

An assembly is the unit of deployment in .NET roughly analogous to an executable program or DLL in prior versions of Windows. A runtime host loads the runtime by using the ICorRuntimeHost or CorBindToRuntimeEx, prior to Whidbey. These APIs call a shim DLL, MSCOREE.DLL, whose only job is to load the runtime.

Only a single copy of the runtime (also known as the CLR) engine can ever be loaded into a process during the process’s lifetime; it is not possible to run multiple versions of the CLR within the same host. In pre-Whidbey versions of .NET, a host could specify only a limited number of parameters to ICorRuntime Host or CorBindToRuntimeEx, namely the following:

• Server or workstation behavior
• Version of the CLR (for example, version 1.0.3705.0)
• Garbage collection behavior
• Whether or not to share Just-In-Time compiled code across
AppDomains (an AppDomain is a subdivision of the CLR runtime space)

Two examples of specialized runtime hosts are the ASP.NET worker process and Internet Explorer. The ASP.NET worker process differs in codelocation and how the executable code, threads, and AppDomains are organized. The ASP.NET worker process divides code into separate “applications,” application being a term that is borrowed from Internet Information Server to denote code running in a virtual directory. Code is located in virtual directories, which are mapped to physical directories in the IIS metabase.

Internet Explorer is another runtime host with behaviors that differ from the ASP.NET worker or SQL Server 2005. IE loads code when it encounters a specific type of <object> tag in a Web page.
The location of the code is obtained from an HTML attribute of the tag. SQL Server 2005 is an example of a specialized runtime host that goes far beyond ASP.NET in specialization and control of CLR semantics.

SQL Server’s special requirements of utmost security, reliability, and performance, in addition to the way that SQL Server works internally, have necessitated an overhaul in how the managed hosting APIs work as well as in how the CLR works internally. Although early versions of SQL Server 2005 did run on .NET version 1.0, the changes in the CLR are important in ensuring enterprise quality.

SQL Server is a specialized host like ASP.NET and IE, rather than a simple bootstrap echanism. The runtime is lazy loaded; if you never use a managed stored procedure or user-defined type, the runtime is never loaded. This is useful because loading the runtime takes a one-time memory allocation of approximately 10–15MB in addition to SQL Server’s buffers and unmanaged executable code, although this certainly will not be the exact number in SQL Server 2005.

How SQL Server manages its resources and locates the code to load is unique as well. To accommodate hosts that want to have hooks into the CLR’s resource allocation and management, .NET 2.0 hosts can use ICLRRuntimeHost instead of ICorRuntimeHost. The host can then call ICLRRuntimeHost::SetHostControl, which takes a pointer to an interface (IHostControl) that contains a method that the CLR can call (GetHostManager) to delegate things like thread management to the host.

SQL Server uses this interface to take control of some functions that the CLR usually calls down to the operating system for directly. SQL Server manages its own thread scheduling, synchronization and locking, and memory allocation. In .NET runtime hosts, these are usually managed by the CLR itself.
In SQL Server 2005 this conflict is resolved by layering the CLR’s mechanisms on top of SQL Server’s mechanisms.

SQL Server uses its own memory allocation scheme, managing real memory rather than using virtual memory. It attempts to optimize memory, balancing between data and index buffers, query caches, and other internal data structures. SQL Server can do a better job if it manages all of the memory in its process.

As an example, prior to SQL Server 2000, it was possible to specify that the TEMPDB database should be allocated in memory. In SQL Server 2000 that option was removed, based on the fact that SQL Server can manage this better than the programmer or DBA. SQL Server manages its memory directly by, in effect, controlling paging of memory to disk itself rather than letting the operating system do it.

Because SQL Server attempts to use as much memory as is allocated to the process, this has some repercussions in exceptional condition handling, which I will discuss next. SQL Server also uses its own thread management algorithm, putting threads “to sleep” until it wants them to run. This facility is known as UMS (user-mode scheduler).

Optionally, SQL Server can use fibers rather than threads through a configuration option, though this option is rarely used. The CLR also maintains thread pools and allows programmers to create new threads. The key point is that SQL Server uses cooperative thread scheduling; the CLR uses preemptive thread scheduling.

Cooperative thread scheduling means that a thread must voluntarily yield control of the rocessor; in preemptive thread scheduling the processor takes control back from the thread after its time slice has expired. SQL Server uses cooperative thread scheduling to minimize thread context switches.

With threading come considerations of thread synchronization and locking. SQL Server manages locking of its own resources, such as database rows and internal data structures. Allowing programmers to spin up a thread is not to be taken lightly. SQL Server 2005 CLR code executes under different permission levels with respect to CLR activities.

The hosting APIs in .NET 2.0 are enhanced to enable the runtime host to either control or “have a say in” resource allocation. The APIs manage units of work called Tasks, which can be assigned to a thread or a fiber. The SQL scheduler manages blocking points, and hooks PInvoke and interop calls out of the runtime to control switching the scheduling mode.

The new control points allow SQL Server to supply a host memory allocator, to be notified of low memory conditions at the OS level, and to fail memory allocations if desired. SQL Server can also use the hosting API to control I/O completion ports usually managed by the CLR.

Although this may slow things down a little in the case of an allocation callback, it is of great benefit in allowing SQL Server to manage all of its resources, as it does today. In .NET 1.0 certain exceptional conditions, such as an out-of-memory condition or a stack overflow, could bring down a running process (or App Domain). This cannot be allowed to happen in SQL Server.

Although transactional semantics might be preserved, reliability and performance would suffer dramatically.
In addition, unconditionally stopping a thread (using Thread.Abort or other API calls) can conceivably leave some system resources in an indeterminate state and, though using garbage collection minimizes memory leakage, leak memory. Different runtime hosts deal with these hard-to-handle conditions in different ways.

In the ASP.NET worker process, for example, recycling both the AppDomain and the process itself is considered acceptable since disconnected, short-running Web requests would hardly notice. With SQL Server, rolling back all the in-flight transactions might take a few minutes. Process recycling would ruin long-running batch jobs in progress.

Therefore, changes to the hosting APIs and the CLR exceptional condition handling needed to be made. Out-of-memory conditions are particularly difficult to handle correctly, even when you leave a safety buffer of memory to respond to them. In SQL Server the situation is exacerbated because SQL Server manages its own memory and attempts to use all memory available to it to maximize throughput.

This leaves us between a rock and a hard place. As we increase the size of the “safety net” to handle out-of-memory conditions, we also increase the occurrence of out-of-memory conditions. The Whidbey runtime handles these conditions more robustly; that is, it guarantees availability and reliability after out-of-memory conditions without requiring SQL Server to allocate a safety net, letting SQL Server tune memory usage to the amount of physical memory.

The CLR will notify SQL Server about the repercussions of failing each memory request. Low-memory conditions may be handled by permitting the garbage collector to run more frequently, waiting for other procedural code to finish before invoking additional procedures, or aborting running threads if needed.

There is also a failure escalation policy at the CLR level that will allow SQL Server to determine how to deal with exceptions. SQL Server can decide to abort the thread that causes an exception and, if necessary, unload the AppDomain. On resource failures, the CLR will unwind the entire managed stack of the session that takes the resource failure.

If that session has any locks, the entire AppDomain that session is in is unloaded. This is because having locks indicates there is some shared state to synchronize, and thus that shared state is not likely to be consistent if just the session was aborted. In certain cases this might mean that finally blocks in CLR code may not run.

In addition, finalizers, hooks that programmers can use to do necessary but not time-critical resource cleanup, might not get run. Except in UNSAFE mode (discussed later in the chapter), finalizers are not permitted in CLR code that runs in SQL Server.

Stack overflow conditions cannot be entirely prevented, and are usually handled by implementing exceptional condition handling in the program. If the program does not handle this condition, the CLR will catch these exceptions, unwind the stack, and abort the thread if needed. In exceptional circumstances, such as when memory allocation during a stack overflow causes an out-of-memory condition, recycling the App Domain may be necessary.

In all the cases just mentioned, SQL Server will maintain transactional semantics. In the case of AppDomain recycling, this is needed to assert that the principal concern is reliability, if needed, at the expense of performance. In addition, all the Framework class libraries (FX libraries) that SQL Server will load have gone through a careful review and testing to ensure that they clean up all memory and other esources after a thread abort or an AppDomain unload.

Source: https://www.nilebits.com/blog/2009/09/sql-server-as-a-runtime-host/

Monday, June 1, 2009

System Metadata Tables and INFORMATION_SCHEMA in SQL Server 2005

Information about assemblies as well as the assembly code itself and the dependencies is stored in the system metadata tables, which, in general, store information about SQL Server database objects, such as tables and indexes. Some metadata tables store information for the entire database instance and exist only in the MASTER database; some are replicated in every database, user databases as well as MASTER. The names of the tables and the information they contain are proprietary.

System metadata tables are performant, however, because they reflect the internal data structures of SQL Server. In the big rewrite that took place in SQL Server 7, the system metadata tables remained intact. In SQL Server 2005, the metadata tables have been overhauled, revising the layout of the metadata information and adding metadata for new database objects.

In addition, programmers and DBAs can no longer write to the system metadata. It is really a read only view. The SQL INFORMATION_SCHEMA, on the other hand, is a series of metadata views defined by the ANSI SQL specification as a standard way to expose metadata.

The views evolve with the ANSI SQL specification; SQL:1999 standard INFORMATION_SCHEMA views are a superset of the SQL-92 views. SQL Server 2000 supports the INFORMATION_SCHEMA views at the SQL-92 standard level; some of the SQL:1999 views may be added in SQL Server 2005.
SQL Server is, so far, the only major database to support the INFORMATION_SCHEMA views.

Getting Metadata from SQL Server:

-- this uses the system metadata tables
SELECT * FROM sysobjects WHERE [type] = 'U'
-- this uses the INFORMATION_SCHEMA
SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'BASE TABLE'

SQLServer 2005 includes a reorganization of the system metadata tables. This includes renaming the tables to use an arbitrary schema (named SYS) as well as table renames and reorganization of some of the information. The goal, once again, is speed and naming consistency.

The equivalent query to the previous two using the new system metadata tables would be as follows:

SELECT * FROM SYS.TABLES

Note that the information returned by all three queries differs both in the number of columns returned, the column names used, and the information in the resultset. Information about assemblies and the assembly code itself is stored in three metadata tables. These tables exist on per database, since assemblies are scoped to the database and schema.

Sys.assemblies stores information about the assembly itself as well as schema_id, assembly_id, and the .NET version number. The assembly dependencies are stored in sys.assembly_references, one row per assembly-reference pair. Finally, the assembly code itself is cataloged in sys.assembly_files.

In all cases, this table contains the actual code rather than the name of the file where the code resided when it was cataloged. The original file location is not even kept as metadata. In addition, if you have added a debugger file, using the ALTER ASSEMBLY ADD FILE DDL statement, the debug information will appear as an additional entry in the sys.assembly_files table.

Notice that you can define an assembly that is “invisible” with respect to defining routines and types to the runtime. Lack of visibility is the default when SQL Server loads dependent assemblies of an assembly defined using CREATE ASSEMBLY. You might do this, for example, to define a set of utility routines to be invoked internally only.

If you specify IsVisible=true (the default) this means that methods and types in this assembly can be declared as SQL Server methods and types, either through the “list” properties or directly through DDL.

Source: https://www.nilebits.com/blog/2009/08/system-metadata-tables-and-information_schema-in-sql-server/