Setting Up Smart Cache

To configure Smart Cache settings, navigate to the following section:

Configuration > Smart Cache

The filer supports Smart Cache, which is designed to provide LAN speed access to data stored in the cloud.

Cache policies allow you to select specific files and data blocks to be continuously cached locally in the filer. Cache policies act on local or remote file systems and provide automated caching, or pinning, of file data. You can create one cache policy for each file system, and each policy can have multiple rules within the policy.

Panzura recommends using the auto cache action with prepopulate enabled. This ensures that files are available in disk cache for end users. Prepopulating makes the data available without forcing a reduction in cache.

Pinning allows an administrator to forcefully localize (pin) data in the cache within a filer to provide guaranteed LAN speed performance. Because pinning consumes cache space, it should be considered only if needed for performance, with the trade‐off between performance and cache space kept in mind.

Local and Remote File Systems

A filer can have a local file system and multiple remote file systems. The local file system in this section refers to the filer name in CloudFS.

When file systems are synchronized in CloudFS, a copy of the metadata for each local file system on a filer is copied to the other filers. The metadata stored on the other filers is called the remote file system. Remote file system synchronization causes the filer to receive the metadata updates in CloudFS.

Because a local file system has the same name as the filer name, you can tell if a file system is local by comparing its name with the filer name shown at the top of the page.

Default Auto Prepopulate/Prefetching Features

The following prefetching operations occur automatically.

Folder Prefetching Based on Ownership Change

The filer prefetches a folder whenever ownership of the folder changes from one filer to another.

  • This prefetching remains in effect for a folder for 3 days after ownership of the folder changes.
  • This new type of prefetching is performed automatically.
  • Prefetching continues until there have been no ownership changes for 3 days.
  • We recommend users to 'Enable Auto Prepop' by default for a good experience and more robustness.

Automatic File Caching

The filer automatically caches both read and written data:

  • Written data is automatically cached.
  • Read data that is not already cached is cached.

Automatic Block Prefetch

When a user requests a file, the filer immediately prefetches the remaining blocks of the file. By automatically prefetching the file's blocks, the filer provides rapid access to users who likely will require access to more of the file's blocks before finishing with the file.

File Block Grouping

Another optimization feature that the filer automatically employs is file block grouping. File group blocking groups blocks based on either or both of the following:

  • Filename. The blocks are in the same file.
  • Time when the blocks are saved. Blocks of any files that are saved at around the same time are grouped into the same block files.

Adjacent Block Prefetching

For further optimization, when a drive file is downloaded, the blocks saved immediately before and after the requested file also are downloaded.

  • The last block to be saved before a block of the requested file was saved also is downloaded.
  • Likewise, the first block to be saved after a block of the requested file is saved also is downloaded.

Setting Up Cache Policies

Follow this process to set up cache policies. Details are in the following table.

  1. Click Add Policy.
  2. Add a policy name, select the filesystem, and verify the default action. Panzura recommends that you keep the default action as auto‐cache and then use rules to assign pinning. Click Add. The policy is added.
  3. Click the arrow to the left of the policy, if needed, to display additional settings.

  4. Click Add Filesystem if you want to add another filesystem to the policy. Select the filesystem and default action, and click Add. A filesystem can be included in at most one policy, so your choices are limited to filesystems that have not yet been assigned to a policy. You can also edit or delete a filesystem entry from the table.

  5. Click Add Rule to add a rule to the policy. See details in the following table. Rules are considered in the listed order. Use the Up and Down arrows to change order.

Sample Rules

The following table shows some examples of rule use cases and the rules used to implement them.

Data Locality Rule
Use Case
Rule

Match a directory path, which always begins with the file system.

The file system for the policy is /cloudfs/cc1-ca.

homedir/sampledir/*

This rule matches anything in the sampledir directory under /cloudfs/cc1-ca/homedir.

Match anything in a specified directory from any directory path.

*/sampledir/*

This rule matches any path that includes a sampledir directory, such as ../homedir/sampledir/*, ../temp/sampledir/*, or ../Dept/Sales/sampledir/*

Match one unknown character. ?at matches Cat, cat, Bat or bat.

?at

This rule matches any of the following:

  • Cat
  • cat
  • Bat
  • bat
Match any number of unknown characters.

Sys*

This rule matches each of the following:

  • Sys
  • System
Match a character as part of a group of characters.

[CB]at

This rule matches both of the following:

  • Cat
  • Bat

But the rule does not match either of the following:

  • cat
  • bat

Escape Characters

To use a special character reserved for policy rule matching as part of a string, use an escape character:

Sys\*

This rule matches Sys* but does not match Sys\*.

For example, if you enter *.xls and then select Pinned as the action, the rule pins all files with a .xls extension.

Rule Actions

Each rule includes an action that the file performs when the rule is matched:

  • Auto Cache: This is the standard behavior of the filer. The filer will evict blocks of a file as needed to accommodate new data.
  • Deny: When this action is selected, the creation of files with names matching the glob expression is not allowed for that file system.
  • Do Not Cache: This action effectively applies a ’first out’ policy to the data. Data is cached, but will be evicted first before other data if space is required in the filer, regardless of the type of data.
  • Pinned: Applies a ’last out’ policy to data. It avoids evicting data unless the cache is full and new priority data is being ingested. Pinned data will be evicted only as a last resort after other data to maintain normal operations.
  • Not Replicated: Causes the data not to be copied to the cloud. This creates an unprotected scratch or temp space and should be used cautiously since the data is temporary.

Policy Configuration Example

In the following example, a Smart Cache policy (also called a data locality policy) is configured. The policy, named Policy-1, contains the following rule:

/homedir/sampledir/*

This rule locally caches all files in the /homdir/sampledir/ folder.

This is only an example. Your configuration may contain more rules, depending on the content to be cached locally.

 

Here is the configuration dialog for an individual rule.

 

After configuring rules and adding them to a policy, click Save to save the changes.

Policy Location (Which filer is it on?)

If you have multiple filers in the cloud file system, the Filesystem drop-down list has multiple entries. In this case, you must decide where to apply the policy: to the local filer’s file system, or to a remote filer’s file system on this local filer. For example, assume that you have nodes at site A and B. You are working from site A, but the data of interest is on site B. For the policy, select the filer B file system from the drop-down. Then when data is written to the remote file system, it is also localized on filer A.

Smart Cache Setting Options

The following table describe the Smart Cache settings.

Smart Cache Setting Description
Cache Settings
Enable Smart Cache Select to enable smart cache.
Percent of Storage for Cache

Enter the percentage of storage that is reserved for cache. The default is 50%, which is the maximum allowed value. Work with Panzura Support before changing this value.

Enable Auto Pre-Populate Enables the filer to pre-populate its cache for faster performance.
Maximum Percent of Storage for Cache Enter the percentage of storage that is reserved for cache. The default is 66%, which is the maximum allowed value. Work with Panzura Support before changing this value.
Cache all on Cloud Read Select whether to enable or disable caching for read operations. Default is enabled. When a client requests to read a block of data, the filer fetches the drive file, selects the block, and returns it to the client. Typically the entire drive file is kept in memory cache for a few seconds or minutes before being flushed. When the cache all on cloud read option is enabled, the entire drive file will persist in the cache until it is forced out, possibly for days or weeks. This feature can reduce the number of cloud reads required for files that are being frequently accessed by clients. The only potential downside is that other data in the cache will be flushed more frequently.
Cache Policies
Policy Parameters

Click Add a Policy, specify the new policy in one of the following ways, and click Add.

  • Policy Name: Add a name to identify the policy.
  • File System: Select a filesystem on which to base the policy. (If you have multiple filers in the cloud file system, see Policy Location (Which filer is it on?).)
  • Default Action: Select a default action from the following:
    • Auto Cache: This is the standard behavior of the filer. The filer will evict blocks of a file as needed to accommodate new data.
    • Pinned: Applies a ’last out’ policy to data. It avoids evicting data unless the cache is full and new priority data is being ingested. Pinned data will be evicted only as a last resort after other data to maintain normal operations. If you choose pinned as the default action, the only way to unpin is to specify the auto cache in a rule.

Note: Panzura recommends that you keep the default action as auto cache and then use rules to assign pinning.

Rescan

Click the Rescan icon for a filesystem in the table to apply the policy to it. A dialog box opens to show the following options. Select an option and click Rescan Now to start the scan.

  • Entire File System: Initiates a full scan of the specified filesystem and applies the rule.
  • Partial Rescan: Provides the ability to selectively apply the rule based upon specified criteria, which can be a specific path, and/or a date range.

Note:  The rescan process consumes resources on the filer. If possible, rescan for the policy at off-peak times.

Rule Parameters

Click Add a rule.

  • Rule Expression: The rule syntax is based on glob programming, with the available actions or auto cache or pinning added to a glob expression. The rules are case sensitive. (See Sample Rules.)
  • Rule Action: See Rule Actions.
  • Prepopulate: Prepopulating makes the data available without forcing a reduction in cache. Panzura recommends selecting enable.
  • Deduplication: Indicates whether to apply deduplication to the data. This option controls deduplication for files that match the rule expression. If you specify No, deduplication is disabled for those files. Setting this option overrides the global deduplication setting and allows you to optimize deduplication at a fine-grained level. For example, If you enabled deduplication globally on the Cloud FS page (Distributed CloudFS Settings), you can override that setting for mp3 files by specifying a rule for mp3 files and setting dedup to No.