Category Archives: Software

Securely Implementing Trusted, “Friend”-Like Classes In JavaScript

Background

JavaScript object properties are public by default.

Since ECMAScript (ES) 2020, there is native support for private properties (and methods) using the hash (#) prefix.

In my previous post, I demonstrate a way to implement natively-enforced (not based on the underscore (_) convention) protected-style properties that doesn’t require shared lexical scope or public accessors (i.e. it behaves very much like you would expect a native implementation to).

The historical way to simulate protected properties was to use a lexically-scoped WeakMap of instances to properties (or, more efficiently, to a protected-state object).

The “legacy” lexically-scoped WeakMap approach is more like trusted-class access. A method at any class level can access any type of instance stored in the WeakMap, even if it’s a less-or-differently-derived instance. C++ has a similar concept called a friend class.

In this article, I’ll show how we can implement trusted, friend-like classes securely in JavaScript while allowing classes to be distributed across different files (no shared-lexical-scope limitation).

“Insider” Properties

“Insider” properties will be stored in an object on a per-base-class-instance, and will be accessible to the designated base class and all trusted sub-classes (related by inheritance) and unrelated (in terms of inheritance) partner-classes.

This model has three types of components:

  1. A trust component, which defines the scope of trust
  2. A base-class component, which creates the shared state and controls distribution
  3. One or more trusted or “friend” classes which request and cache access to the shared state (these can be sub-classes of the base or completely outside of the base-class hierarchy)

The Trust Component

The trust component is a stand-alone file that contains the “trust configuration”. It is also a JavaScript “barrel file” that exports the related base and trusted sub-classes.

This approach groups trusted classes in a way that can be deployment/application-specific without modifying the class files themselves in any way in addition to insuring trusted-version consistency and proper dependency-loading resolution.

Here is the trust-component pattern:

// ---------- Barrel + trust file (trust.js) ----------

import { Base } from '.../base.js';
import { Partner } from '.../partner.js';
import { Sub } from '.../sub.js';

// All other modules should import these from trust.js
export { Base, Partner, Sub };

// Do not reference computed exports yet!
let trusted; // We'll cache our trust data here later

/**
 * Is a partner or sub-class trusted?
 * @param {function} cls - Class object (constructor function, not name)
 * @returns {boolean}
 *
 * Can be implemented with [].includes, Set([]).has, switch/case, etc.
 */
export const isTrusted = (cls) => {
  trusted ||= [Partner, Sub];
  return trusted.includes(cls);
}

The isTrusted function just needs to return true if the class (passed as a constructor function, not a name) is trusted, or false if it isn’t. It can be implemented using an array (and .includes) as shown here (efficient and low-overhead for short lists, a Set (and .has; more performant for long lists), if statements, switch/case statements, or any other approach that fits your needs.

Base-Class Component

In addition to your core base-class behavior, the base class is also responsible for creating the initial, per-instance #insider state object and controlling distribution of access to trusted classes (which will store a reference to the same shared object in their own #insider private fields).

Here is the base-component pattern:

// ---------- Designated base class (base.js) ----------

import { isTrusted } from '.../trust.js';

export class Base {
  #insider; // Base view of shared insider-properties object
  static #insiderBaton = null; // Per-class hand-off baton
  static #protoInsider = {
    insiderMethod () {
    // Optional - verify this insider-properties object is official:
      if (this !== this.thys.#insider) throw new Error('Unauthorized');
      // this: the shared insider-properties object
      // this.thys: the original base object
    }
  };

  constructor () {
    const insider = this.#insider = Object.create(cls.#protoInsider); // Create insider properties object
    insider.thys = this; // Enables unbound prototype methods to see original "this"
  }

  /**
    * Pass this.#insider to instances of trusted classes
    * @param {function} cls - The class (constructor function, not name)
    * @param {Base} instance - The instance for which #insider is requested
    * @param {function} receiver - A baton handoff receiver function
    */
  _getInsider (cls, instance, receiver) {
    if (!isTrusted(cls)) throw new Error('Untrusted request');
    cls._passInsider(instance.#insider, receiver);
  }

  /**
   * Get #insider for another object (base-class version)
   * @param {Base} other - The object whose #insider is desired
   * @returns {object|undefined} - #insider, if available
   */
  #getInsiderFor (other) {
    if (other instanceof Base) return other.#insider;
  }
}

Prototype insider methods can be added directly to the static #protoInsider object. This will be used as the initial object prototype for the insider properties object. The constructor will add a thys property referring to the original object so that insider methods can be left unbound like standard prototype methods (within an insider method, “this” refers to the insider-properties object and “this.thys” refers to the original object).

Base._getInsider and Sub._passInsider or Partner._passInsider, in conjunction with receiver functions (covered below), work together to pass base-instance #insider to trusted sub-classes and partner classes.

Trusted/”Friend”-Class Components

Trusted-class constructors are responsible for requesting insider property access and caching it in their private #insider fields for use within each trusted class level.

They do this by calling a known, frozen, class-method of a known class (the base class), passing it their class object, the instance for which #insider properties are desired (usually this for sub-class partner classes), and a “baton receiver function”.

The base class passes the #insider state object and the supplied receiver function to a known, frozen, “hand-off” class-method of the specified class. The hand-off function uses a class-specific baton (every trusted sub-class must have its own), which the receiver function (as an instance method of the same class) is able to receive.

The #insider state of other instances can be received by supplying a receiver function that sets something other than this.#insider (typically a variable in the requesting method’s local scope).

Trusted sub-classes and trusted partner classes (outside of the base-class hierarchy/prototype chain) are fairly similar, with just a couple of differences.

Trusted Sub-Class Pattern

// ---------- (Related) sub-class (sub.js) ----------

import { Base } from '.../trust.js'; // from trust.js, not base.js

const getProto = Object.getPrototypeOf, setProto = Object.setPrototypeOf;

export class Sub extends Base {
  #insider; // Sub-class view of shared insider-properties object
  static #insiderBaton; // Per-class hand-off baton
  static #protoInsider = setProto({
    insiderMethod () {
      super.insiderMethod();
    }
  }, null);

  constructor () {
    // Request insider-properties object access
    Base._getInsider(Partner, this, () => this.#insider = Sub.#insiderBaton);
    const insider = this.#insider, protoInsider = Sub.#protoInsider;
    // Fix #protoInsider and #insider prototypes
    if (!getProto(protoInsider)) setProto(protoInsider, getProto(insider));
    setProto(insider, protoInsider);
  }

  /**
   * Pass a base #insider to trusted partner instances (called by Base._getInsider)
   * @param {*} insider - The base #insider
   * @param {function} receiver - The baton receiver function
   */
  static _passInsider (insider, receiver) {
    Sub.#insiderBaton = insider;
    try { receiver(); } // Bona fide Sub-class receivers can access the baton
    finally { Sub.#insiderBaton = null; }
  }
}

Trusted sub-classes of Base may extend the insider-properties prototype, but some sub-classes might not be trusted. Because of this, insider prototypes are managed privately, and a target-constructor prototype is not automatically selected as it is for the “protected” pattern. Instead, the insider-properties object-prototype is initially set to the Base insider-prototype and subsequently updated within trusted sub-class constructors.

Trusted Partner-Class Pattern

// ---------- (Unrelated) partner class (partner.js) ----------

import { Base } from '.../trust.js'; // from trust.js, not base.js

export class Partner {
  #insider; // Partner view of shared insider-properties object
  static #insiderBaton; // Per-class hand-off baton

  /**
   * @param {Base} base - Base instance
   */
  constructor (base) {
    // Request insider-properties object access for base instance
    Base._getInsider(Partner, base, () => this.#insider = Partner.#insiderBaton);
    // Base._getInsider(Sub, this, () => this.#insider = Sub.#insiderBaton); // Sub-class version
  }

  /**
   * Pass a base #insider to trusted partner instances (called by Base._getInsider)
   * @param {*} insider - The base #insider
   * @param {function} receiver - The baton receiver function
   */
  static _passInsider (insider, receiver) {
    Partner.#insiderBaton = insider;
    try { receiver(); } // Bona fide Partner-class receivers can access the baton
    finally { Partner.#insiderBaton = null; }
  }

  // A pseudo-insider method (requires caller to know #insider)
  gatedMethod (insider) {
    if (insider !== this.#insider) throw new Error('Unauthorized');
  }
}

The pattern-as-shown assumes that a Partner instance is associated with a specific base (or sub-class) instance. Alternatively, one could just pass an instance to any method that requires it, and have the method request the associated #insider using #getInsiderFor (see “Accessing Another Object’s #insider”, below.)

The partner-class pattern does not include insider-prototype chaining, as there is no way to guarantee a single, predictable, consistent prototype chain in the general case. Partner classes should use the pseudo-insider-method approach (a method on the partner-class prototype that verifies that the caller has passed #insider as a shared secret) to provide insider functionality instead (see the gatedMethod example in the pattern code).

Accessing Another Object’s #insider

There are two ways to access another object’s #insider.

If the other object is an instance of the class that is requesting access, the method can access its #insider directly (private fields are private by class, not by instance).

The second way uses Base._getInsider with a custom receiver function. Since it’s invoking a base-class method, it can be used between any instances (as long as the requesting method is in one of the trusted classes).

The private #getInsiderFor method (in each non-base class) follows this pattern:

  /**
   * Get #insider for another object (partner/sub-class version)
   * @param {Base|ThisClass} other - The object whose #insider is desired
   * @returns {object|undefined} - #insider, if available
   */
  #getInsiderFor (other) {
    if (other instanceof ThisClass) return other.#insider;
    if (other instanceof Base) {
      let insider;
      Base._getInsider(ThisClass, other, () => insider = ThisClass.#insiderBaton);
      return insider;
    }
  }

Security

Most forms of type checking in JavaScript, including those based on an object’s constructor or new.target can be misdirected through code manipulation. Private element (#) access is managed directly at the JavaScript-engine level, however, and therefore does not have that problem. This model leverages that mechanism for verifying that only methods of actually-trusted classes can gain access.

Base._getInsider will only ever pass #insider via a pre-determined method on a class it is configured to trust. A method in an untrusted class has two options:

  1. Follow the pattern, creating its own hand-off function and supplying its own class to Base._getInsider
  2. “Lie”, passing a trusted class to Base._getInsider instead

In the first case, the class will not be on the trusted list, so Base._getInsider will throw an error without even attempting to distribute access. This result will be typical for trust misconfiguration (a class that should be trusted wasn’t added to the trust configuration, or the wrong trust configuration is being loaded).

In the second case, Base._getInsider will distribute #insider access to the specified (trusted class) baton, but the requesting method, being of a different class, will have no access to the trusted-class baton. This might happen as the result of malicious code, or if “hard-wired” class names are being used instead of the boilerplate approach in the pattern as documented.

The code in the GitHub repository (see Resources, below) includes additional code to aid in prevention of class tampering in some execution contexts. That code has been omitted here in order to focus on general concepts and approaches.

Resources

The code is also available on GitHub at https://github.com/bkatzung/insider-js.

Related

Implementing Secure, Cross-File JavaScript Protected Properties And Methods

Implementing Secure, Cross-File JavaScript Protected Properties And Methods

Background

It’s often desirable to be able to control the visibility of an object’s properties. Sometimes it’s convenient for an object’s properties to be publicly accessible, sometimes base-classes and derived-classes need to share access, and sometimes you don’t want to allow any access outside of the defining class.

Many languages, including the TypeScript derivative of JavaScript, include access control keywords such as public, protected, and private for this. The options available natively within JavaScript are more limited (and TypeScript’s protections cannot protect against non-TypeScript-generated JavaScript).

JavaScript supports public object properties (the default), an unenforced convention of using an underscore (_) prefix before protected/private properties, and, since ECMAScript 2022, “private elements” (fields, methods, properties) via hash (#) name-prefixes.

Private elements are accessible by class. Any code within the defining class can access the private elements of any instance of that class. Private-element names must be unique across all of the private elements within a class, but are available for reuse in other classes. Code does not have access to the private elements of other classes within the same class hierarchy.

Intended Scope

The goal of this implementation is to make data accessible to all of the methods within an instance’s class hierarchy, and inaccessible (except via class-provided interfaces) to all other code.

class A {}
const a1 = new A(), a2 = new A();
class B extends A {}
const b1 = new B(), b2 = new B();
class C extends B {}
const c1 = new C(), c2 = new C();
class D {}
const d1 = new D();
function f () {}
  • a1 and a2 will have access to each other’s protected properties
  • Class A methods of b1 and b2 will have access to a1, a2, b1, and b2 (all instanceof A) protected properties
  • Class B methods of b1 and b2 will have access to b1 and b2 (instanceof B) but not to a1 or a2 protected properties
  • Class A methods of c1 and c2 will have access to a1, a2, b1, b2, c1, and c2 (all instanceof A) protected properties
  • Class B methods of c1 and c2 will have access to b1, b2, c1, and c2 (all instanceof B), but not a1 or a2 protected properties
  • Class C methods of c1 and c2 will have access to c1 and c2 (instanceof C) but not to a1, a2, b1, or b2 protected properties
  • d1, d2, and f will not have access to a1, a2, b1, b2, c1, or c2 protected data

Notice that you cannot gain additional access to an existing instance of an existing type by creating a new sub-class with additional methods (the additional methods only have access to its own instances or instances of its own sub-classes).

Historical Approach: Lexical Scoping

One historical approach for storing protected (and before ES2022, private) properties is through the use of scoped storage, like this (note that I am using “guarded” instead of “protected” to avoid TypeScript (and possibly future JavaScript) keyword confusion):

const guardedMap = new WeakMap(); // <instance, protectedProperties>

export class A { // Must be in the same file as guardedMap
  constructor () {
    const guarded = { /* initial protected properties here */ };
    guardedMap.set(this, guarded);
    // guarded.property
  }

  baseMethod () {
    const guarded = guardedMap.get(this);
  }
}

export class B extends A { // Must be in the same file as guardedMap
  constructor () {
    super();
    const guarded = guardedMap.get(this);
    // May add additional protected properties
  }

  subMethod () {
    const guarded = guardedMap.get(this);
  }
}

However, requiring all of the related classes for a hierarchy to exist within a single file is often impractical for a number of reasons (file size, different authors, different development timeframes, etc).

You can include accessor methods in the base class to allow sub-classes in other files to gain access, but then there’s nothing to prevent code outside of the class hierarchy from using the accessors to gain access too.

Fortunately, with just a bit more effort, we can use a more tightly-controlled approach.

Goals For A Better Implementation

  • Related classes within a class hierarchy must have shared access to protected properties
  • Related classes must not need to reside within the same source file (i.e. support multiple lexical scopes)
  • Access from outside the class hierarchy should be prevented at the language level
  • Protected properties should be available for use as soon as possible during object construction
  • Avoid TypeScript (and maybe future JavaScript?) “protected” keyword confusion (I’ll continue to use “guarded” instead)
  • Some form of protected methods (methods that can only be invoked from within the class hierarchy)
  • Note: This implementation does not include generating nested protected scopes (a single protected scope will be shared across the class hierarchy)

“Threaded-Access” Strategy

Let’s use a different solution (one that doesn’t risk public access) by “threading” access between classes in the hierarchy via a common method defined in each class level, with each method invoking the next using “super“. Conceptually, the approach looks something like this:

// ** CONCEPT ONLY - CODE WILL NOT WORK **

export class A {
  #guarded; // Class-A-visible view of shared protected-properties object

  constructor () {
    const guarded = this.#guarded = { /* initial protected properties here */ };
    this._setGuarded(guarded); // Try to "push" guarded across the class hierarchy
  }

  _setGuarded () { /**/ } // Base-class stub
}

export class B extends A {
  #guarded; // Class-B-visible view of shared protected-properties object

  constructor () {
    super();
    // Ideally, this.#guarded should be available here
  }

  // Set local #guarded from value passed by base-class constructor
  _setGuarded (guarded) {
    // #guarded is not yet "attached" at the time _setGuarded is called
    this.#guarded = guarded; // FAILS!
  }
}

The code above won’t work as-is, because private elements aren’t associated with the object until after the super call has completed. In this specific example, the B class this.#guarded does not yet exist at the time the B class _setGuarded is called from the A constructor, because the A constructor has not yet returned to the B constructor.

We can get around that problem by using a subscription-based, “pull model” that operates strictly within the class hierarchy. Once working, protected state also provides a way to offer pseudo-protected methods that can only be invoked from within the class hierarchy. The details are covered in the following section.

Cross-File, “Threaded” JavaScript Protected Properties
(Final Implementation)

// ---------- Base-class file ----------

export class A {
  #guarded; // Class-A-visible view of shared protected-properties object
  #guardedSubs = new Set(); // Protected-properties subscription setter functions

  static protoProtected = {
    protectedMethod () {
      // Optional - verify this protected-properties object is official:
      if (this !== this.thys.#guarded) throw new Error('Unauthorized');
      // this: the protected-properties object
      // this.thys: the original object
    }
  };

  // this.#guarded.protectedMethod(...)

  constructor () {
    const guarded = this.#guarded = Object.create(this.constructor.protoProtected);
    guarded.thys = this; // Allows unbound prototype methods to find "this"
    this._subGuarded(this.#guardedSubs); // Invite sub-classes to subscribe to access
  }

  // Distribute this.#guarded; called by sub-class constructors
  _getGuarded () {
    const guarded = this.#guarded, subs = this.#guardedSubs;
    try {
      for (const sub of subs) {
        sub(guarded); // Try to distribute guarded; throws if not yet attached
        subs.delete(sub); // Remove successfully-completed subscriptions
      }
    } catch (_) { /**/ }
  }

  _subGuarded () { /**/ } // Base-class stub

  gatedMethod (guarded) {
    if (guarded !== this.#guarded) throw new Error('Unauthorized');
    // ...
  }

  // this.gatedMethod(this.#guarded, ...)
}

// ---------- Sub-class file ----------

import { A } from '...';

export class B extends A {
  #guarded; // Class-B-visible view of shared protected-properties object

  static protoProtected = Object.setPrototypeOf({
    protectedMethod () {
      if (this !== this.thys.#guarded) throw new Error('Unauthorized');
      super.protectedMethod();
      // ...
    }
  }, A.protoProtected);

  constructor () {
    super();
    this._getGuarded();
    // <-- this.#guarded is synchronized and ready here
  }

  _subGuarded (subs) { // Subscribe to protected-properties access
    super._subGuarded(subs);
    // subscription setter function sets local #guarded once
    // after attachment
    subs.add((g) => this.#guarded ||= g);
  }
}

Protected Properties

Each class (base and sub-class) gets a private #guarded, which, through synchronization, will be made to point to the same shared (per-instance), protected-properties object.

During construction, the base class invites sub-classes to subscribe to receive access to the protected properties (base #guarded object). Only classes in the hierarchy receive the invitation (it’s never externally accessible).

Classes wanting protected-property access respond to the invitation (they subscribe) by adding a setter function (which accepts a protected-properties object and sets their private #guarded) to the subscription-set (subs) passed to _subGuarded.

Important: The super-method (super._subGuarded(subs)) must be called before adding the setter function to the subscription-set so that setter functions get added in least-derived-class-to-most-derived-class (i.e. top-to-bottom) order.

Each sub-class constructor calls this._getGuarded() after it calls super() in order to set its private this.#guarded to the shared protected-protected properties object. This works by attempting to execute each setter function in the subscription-set that was collected by the base-class constructor. A setter will complete (and be removed from the subscription-set) only if the associated class has returned from its constructor’s super() call.

In any class in which the super() call has not yet returned, attempting to set its this.#guarded in its setter function will throw an exception (with the side effect of leaving the setter function in the subscription-set to be attempted again in a subsequent call).

The net effect is that the private this.#guarded gets set, class-by-class, right after each super() call completes.

The setter-function subscriptions are idempotent. It’s possible to recreate the subscription-set by calling _subGuarded post-construction and run all the setter functions again (attempting to set a different protected-properties object), but as the setters have already set each this.#guarded during construction, running them again has no effect.

Protected Methods

The shared protected-properties object approach lends itself to two protected-methods approaches: protected methods associated with protected-properties object itself, and pseudo-protected methods (publicly-visible methods on the main object prototype chain that use the protected-properties object as an access token). I’ll cover the pseudo-protected-methods approach in the next section.

We can create protected methods by adding them “directly” (or via object prototype) to the shared protected-properties object. These can then be called as e.g. this.#guarded.protectedMethod(...). Note that the “this” within protectedMethod will, by default, be the protected-properties object, not the original object.

One possible way to grant access to the original this is to add each protected method as a bound function, but this approach requires adding a custom binding per-object-and-method, which is not very efficient.

By adding a “thys” protected-property referring to the original object as a standard part of the pattern, traditional unbound methods can be used instead (they just need to reference this.thys instead of this to refer to the main object).

By creating the protected-properties object using a chained-prototype approach, it is possible to have sub-classing of methods and super.method calls, just as for the main object class hierarchy (prototype chain).

The pattern, as shown, builds the protected-properties object prototype chain by making the prototype for each class publicly visible. It could also be built “privately” (without public exposure of the prototypes) with some additional steps performed at construction (not covered in the scope of this post).

If a protected method wants to confirm that this (the protected-properties object) and this.thys (the original object) are a properly matched pair, it can do so by testing this === this.thys.#guarded.

Pseudo-Protected Methods

Here, a “pseudo-protected method” is a publicly-visible method (so not truly protected in the traditional sense) on the main-object prototype-chain that provides gated access to its functionality via a shared secret. In this context, the shared secret can be this.#guarded, since it’s known to methods within the base-class and each sub-class in the inheritance hierarchy.

These methods should throw an exception or return some innocuous value if not called from within the class hierarchy (i.e. not passed this.#guarded).

Note that an instance cannot cross-instance call a pseudo-protected method on a less-derived instance than the calling method’s class. Given instances:

const a = new A(), b = new B(); // where B extends A

a can call protected methods on b (and vice-versa for A-class methods of b) because a and b both have an A-level #guarded. B-class methods of b, however, cannot call protected methods on a because there is no B-level #guarded for a.

Resources

This code is also available on GitHub at https://github.com/bkatzung/protected-js.

Related

Securely Implementing Trusted, “Friend”-Like Classes In JavaScript

JavaScript Pattern For Deferred/Just-In-Time Subclass Prototype Initialization

Impetus And Use Case

The impetus for this code pattern is to be able to support class hierarchies spanning multiple “async defer“-loaded JavaScript files.

Developing A Solution

A typical SubClass.prototype = new SuperClass or Object.create(SuperClass) won’t work because a super-class may not have finished loading when a subclass is defined.

To avoid order-of-execution issues, just-in-time initialization of the prototype is performed upon the first constructor invocation. The prototype of the default new instance is already bound by the time the constructor function executes, so the constructor function must return a new “new” instance after switching prototypes.

The constructor calls a helper class-method, $c, to perform the just-in-time initialization. This method replaces itself during initialization to prevent reinitialization in subsequent calls.

Both versions of the helper method call a second helper method, $i, to (potentially) perform instance initialization. This method is registered as both an instance method (for polymorphic invocation) and a class method (as a shortcut for prototype.$i, to facilitate super-initialization in subclasses).

To prevent any initialization of instances for subclass prototypes and duplicate initialization of replacement new objects, the constructor accepts its class object as a sentinel value to indicate that no initialization should be applied.

When the sentinel value is supplied to the constructor, the single parameter false is passed from the constructor to $c and from $c to $i. Otherwise, the constructor’s arguments object is passed as the only parameter instead.

Sample Pseudo-Trace

Here’s a simplified view of what the flow of execution might look like creating the first instance of a subclass using a previously initialized super-class for its prototype.

instance = new Sub(...parameters) // Initial super is Object
  Sub.$c_1x(Arguments [...parameters])
    Sub.prototype = new Super(Super)
      Super.$c(false)
        Super.$i(false)
    new Sub(Sub) // New super is Super
      Sub.$c(false)
        Sub.$i(false)
          Super.$i(false)
    Sub.$i(Arguments [...parameters])
      Super.$i(Arguments [...parameters])

Code Pattern

function Class () {
    var c = Class;
    return c.$c.call(this, arguments[0] !== c && arguments);
}
Class.$c = function (args) { // JIT 1x class init helper
    // var c = Class, p = c.prototype; // "Base" classes (Object subclasses)
    var s = SuperClass, c = Class, p = c.prototype = new s(s); // Subclasses
    p.constructor = c; // Subclasses
    c.$c = function (args) { return this.$i(args); }; // Post-init helper
    p.$i = c.$i = function (args) { // Instance init helper
        s.$i.call(this, args); // Subclasses
        if (!args) return this; // Skip init on false
        // Add this.properties here
        return this;
    };
    // Add p.methods here
    // return this.$i(args); // Base classes (original prototype)
    /*
     * We need to return a new "new" to pick up the new subclass prototype.
     * Note that new c(c) invokes $c(false) which invokes $i(false)
     * before returning here for (possible) initialization.
     */
    return new c(c).$i(args); // Subclasses
};

Ruby Sub-Classes/Inheritance, Include, And Extend

Overview

Ruby Objects, Modules, and Classes

  • In Ruby, an object is a collection of (zero or more) instance variables. It also has a class (see below) and possibly a lazily-created singleton class to hold object-specified methods.
  • A module is an object containing a collection of (zero or more) constants, class variables, instance methods, and included modules. You can include a module in another module and you can extend most objects with a module. Since Ruby 2, you can also prepend a module to a module.
    # Parts of a module
    CONSTANT = "I'm a constant"
    @@class_var = "I'm a class variable"
    @class_inst_var = "I'm a class instance variable" # in a class/module definition
    def self.method; "I'm a class method"; end
    class << self
      def another_method; "I'm a class method too"; end
    end
    def method
      @inst_var = "I'm an instance variable" # inside an instance method
      "I'm an instance method"
    end
  • A class is sub-class of module.
    • Each class has a parent class called a super-class. The child class is called a sub-class. The class inherits the behaviors of the super-class. New classes are sub-classes of the Object class unless you specify otherwise.
    • Classes can typically be instantiated via the new method.
    • Classes are not valid parameters for include or extend.
  • A “def method” adds a method to the “currently open” class or module. A “def object.method” adds a method to the singleton class for the object.
  • When you include a module (let’s call it M1) in another module (let’s call it M2), M1’s constants and instance methods become visible in M2 (as constants and instance methods), and M1 will appear in M2’s included_modules list. M1’s class methods are not added to M2 (but see Including Class Methods below).
  • When you extend an object with a module, the module’s instance methods are added to the object via an automatically-generated anonymous super-class of the singleton class (one for each extending module). In the case where the extended object is a module, the added methods are class methods, not instance methods. The object is unaffected by the module’s constants or class methods.

Confirming The Effects Of include And extend In Modules

The following program can be used to see the affect of using include and extend in modules (and classes):

module Inner
    INNER = "Inner constant"
    def self.inner_cm; "Inner class method"; end
    def inner_im; "Inner instance method"; end
end

module Outer
    include Inner;
    OUTER = "Outer constant"
    def self.outer_cm; "Outer class method"; end
    def outer_im; "Outer instance method"; end
end

module Extension
    EXT = "Extension constant"
    def self.ext_cm; "Extension class method"; end
    def ext_im; "Extension instance method"; end
end

class MyClass; include Outer; extend Extension; end

puts "Constants: " +
    (MyClass.constants(true) - Object.constants(true)).inspect
puts "Class methods: " + (MyClass.methods - Object.methods).inspect
puts "Instance methods: " +
  (MyClass.instance_methods - Object.instance_methods).inspect

The output is as follows:

Constants: [:OUTER, :INNER]
Class methods: [:ext_im]
Instance methods: [:outer_im, :inner_im]

Method Resolution Order

The following program can be used to show the class/module hierarchy and order of method resolution for sub-classing (inheritance), include, and extend:

module Mod1; def m; puts "Mod 1"; super; end; end
module Mod2; def m; puts "Mod 2"; super; end; end
module Mod3; def m; puts "Mod 3"; super; end; end
module Mod4; def m; puts "Mod 4"; super; end; end
module Mod5; def m; puts "Mod 5"; super; end; end
module Mod6; def m; puts "Mod 6"; super; end; end
class Base; def m; puts "Base"; end; end
class Sub < Base
    include Mod1, Mod2; include Mod3
    def m; puts "Sub"; super; end
end
o = Sub.new.extend(Mod4, Mod5).extend Mod6
puts "Sub ancestors: " + o.class.ancestors.inspect
o.m

Regrettably, the include and extend methods process their parameters from last to first, so you need to know that method resolution order is not simply last-to-first encountered when called with multiple modules. The output is as follows:

Sub ancestors: [Sub, Mod3, Mod1, Mod2, Base, Object, Kernel, BasicObject]
Mod 6
Mod 4
Mod 5
Sub
Mod 3
Mod 1
Mod 2
Base

Pictorially, it looks like this (with the number in parentheses indicating the search order):
Ruby extend/include/Sub-class Method Resolution Order

Including Class Methods

It is also possible to add class methods as part of an include or to add instance methods as part of an extend using the included or extended callbacks, respectively:

module Inc_Me
  def inst_m; end
  module ClassMethods; def class_m1; end; end
  def self.included (base)
    base.class_exec do
      extend ClassMethods     # method 1 - extend with named sub-module
      Module.new do           # method 2 - extend with anonymous module
        def class_m2; end
      end.tap { |mod| extend mod }
      def self.class_m3; end  # method 3 - add directly to the class
    end
  end
end

module Ext_Me
  def class_m; end            # instance method here, class there
  module InstanceMethods; def inst_m1; end; end
  def self.extended (base)
    base.class_exec do
      include InstanceMethods # method 1
      Module.new do           # method 2
        def inst_m2; end
      end.tap { |mod| include mod }
      def inst_m3; end        # method 3
    end
  end
end

module M1; include Inc_Me; end
puts "M1 class methods: " + (M1.methods - Object.methods).inspect
puts "M1 instance methods: " +
  (M1.instance_methods - Object.instance_methods).inspect
puts "M1 included modules: " + M1.included_modules.inspect, ''

module M2; extend Ext_Me; end
puts "M2 class methods: " + (M2.methods - Object.methods).inspect
puts "M2 instance methods: " +
  (M2.instance_methods - Object.instance_methods).inspect
puts "M2 included modules: " + M2.included_modules.inspect

which produces:

M1 class methods: [:class_m3, :class_m2, :class_m1]
M1 instance methods: [:inst_m]
M1 included modules: [Inc_Me]

M2 class methods: [:class_m]
M2 instance methods: [:inst_m3, :inst_m2, :inst_m1]
M2 included modules: [#<Module:0x00000000cbd108>, Ext_Me::InstanceMethods]

It is better to use the include-with-extend method (as in module Inc_Me) than the extend-with-include method (as in module Ext_Me), as the primary module name gets included in the included_modules list.

It is also better to extend a sub-class (methods 1 or 2) rather than adding the class methods directly (method 3), since the extended modules are each added to a separate, invisible super-class instead of to the including module itself. The benefit here is that the behaviors can be chained using super if desired, as shown by this code:

module Inc1
  module ClassMethods; def m1; puts "Inc1 m1"; super rescue nil; end; end
  def self.included (base)
    base.class_exec do
      extend ClassMethods
      Module.new do
        def m2; puts "Inc1 m2"; super rescue nil; end
      end.tap { |mod| extend mod }
      def self.m3; puts "Inc1 m3"; super rescue nil; end
    end
  end
end

module Inc2
  module ClassMethods; def m1; puts "Inc2 m1"; super rescue nil; end; end
  def self.included (base)
    base.class_exec do
      extend ClassMethods
      Module.new do
        def m2; puts "Inc2 m2"; super rescue nil; end
      end.tap { |mod| extend mod }
      def self.m3; puts "Inc2 m3"; super rescue nil; end
    end
  end
end

module M; include Inc2, Inc1; end
M.m1; M.m2; M.m3

which produces:

Inc2 m1
Inc1 m1
Inc2 m2
Inc1 m2
Inc2 m3

The included Callback And Nested Includes

If your module includes other modules, the included callbacks for the other modules (if present) will be called when they are included in your module, but not when your module is included elsewhere. This code shows the problem:

module M1
  CONST1 = 'M1 constant'
  module ClassMethods; def cm1; 'M1 class method'; end; end
  def im1; 'M1 instance method'; end
  def self.included (base)
    puts "#{self} included in #{base}"
    base.class_exec { extend ClassMethods }
  end
end

module M2
  include M1
  def self.included (base); puts "#{self} included in #{base}"; end
end

module M3; include M2; end

puts "M2 class methods: " + (M2.methods - Object.methods).inspect
puts M3::CONST1
puts "M3 class methods: " + (M3.methods - Object.methods).inspect
puts "M3 instance methods: " +
  (M3.instance_methods - Object.instance_methods).inspect

which produces:

M1 included in M2
M2 included in M3
M2 class methods: [:included, :cm1]
M1 constant
M3 class methods: []
M3 instance methods: [:im1]

The including module’s included callback should therefore call the included callback for any included modules if none of the base object’s ancestors have previously included the other modules:

def M2.included (base)
  puts "#{self} included in #{base}"
  M1.included base if M1.respond_to?(:included) &&
   (!base.respond_to?(:superclass) || !base.superclass.include?(M1))
end

which, after the change, produces:

M1 included in M2
M2 included in M3
M1 included in M3
M2 class methods: [:included, :cm1]
M1 constant
M3 class methods: [:cm1]
M3 instance methods: [:im1]

Download It

A Ruby gem (called extended_include) based on this posting is available at rubygems.org.

Ruby Gem Sarah Version 2.0.1 Released

Ruby Gem Sarah version 2.0.1 has just been released.

What Is It?

Sarah is a combination sequential array, sparse array, and (“random access”) hash.

Ruby’s own array literal and method calling syntaxes allow you to specify a list of sequential values followed by an either implicit or explicit hash of name/value pairs stored at end of the array. Sarah takes this concept a few steps further.

Values with sequential indexes beginning at 0 are typically stored in the sequential array for efficiency. You can also assign values with non-sequential indexes, and these values are stored in the sparse array (which is actually implemented as a hash). The sequential and sparse arrays work together like a traditional Ruby array, except that there can really be empty holes with no values (as opposed to having nil values as place-holders where no other value has been set in the case of a traditional Ruby array). You can perform most of the typical array operations, including pushing, popping, shifting, unshifting, and deleting. These result in the re-indexing of sparse values in addition to sequential values after the point of insertion or deletion, just as if they had all been stored in a traditional Ruby array.

Values stored with non-integer keys are stored in a separate “random access” (i.e. unordered) hash. Re-indexing of the sequential and sparse arrays does not affect these key/value pairs.

Instead of accessing sparse and random-access values through a hash at the end of the array first, these values all appear at the same level. Compare:

# Traditional Ruby array with implicit hash
a = ['first', 5 => 'second', :greeting => 'hello']
# a[0] = 'first'
# a[1] is a hash
# a[1][5] = 'second'
# a[1][:greeting] = 'hello'

# Using a Sarah
s = Sarah['first', 5 => 'second', :greeting => 'hello']
# s[0] = 'first'
# s[5] = 'second'
# s[:greeting] = 'hello'

Why Should I Use It?

Sarah provides a pure-Ruby sparse array implementation, and can easily be the basis for a pure-Ruby sparse matrix implementation. It also provides efficient linear storage and manipulation in case you don’t know in advance if your data will be sequential or sparse in nature (i.e. it can vary significantly based on user input).

By default, negative indexes are interpreted relative to the end of the array. However, if it’s appropriate to your problem domain, Sarah also has a mode that supports negative indexes as actual indexes. In this mode, insertions and deletions do not result in value re-indexing.

Ruby Gem XKeys Version 2.0.0 Released

Ruby Gem XKeys version 2.0.0 has just been released.

What Is It?

XKeys is a module that can be included in Ruby classes or used to extend Ruby objects to provide convenient handling of nested arrays or hashes, including Perl-like auto-vivification, PHP-like auto-indexing, and per-access default values.

Perl-Like Auto-Vivification For Ruby

A fairly common Ruby programming question, especially for current and former Perl programmers, is how to automatically generate intermediate nodes in nested array and hash structures.

Say, for example, that you want to keep some sort of running tally grouped by year, month, and day. In Perl, this is easily accomplished as follows:

my %tally; # top-level hash of tallies
# and later...
++$tally{$year}{$month}{$day}; # increment tally by year/month/day

Perl will automatically create nested arrays or hashes as you attempt to write to them. They just “spring to life” when you need them; the process is called auto-vivification.

In straight Ruby, implementing the example is more cumbersome…

tally = {} # top-level hash of tallies
# and later...
tally[year] ||= {} # make sure year hash exists
tally[year][month] ||= {} # make sure month hash exists
tally[year][month][day] ||= 0 # make sure day value exists
tally[year][month][day] += 1 # increment tally by year/month/day

Alternatively, you can provide a block of code to the top-level hash to create new hashes whenever a non-existent node is referenced, but they are created when reading (getting) the nested structure instead of when writing (setting) the nested structure, so you get new nodes even when you’re “just looking”.

Using the XKeys gem, the code becomes easier again:

require 'xkeys'
tally = {}.extend XKeys::Hash
# and later...
tally[year, month, day, :else => 0] += 1

The “:else” value is used when the value doesn’t exist yet (this avoids generating an error trying to add 1 to nil on the first tally of each day). Missing nodes are automatically added, but only on write, not on read.

PHP-Like Auto-Indexing For Ruby

PHP allows you to auto-index items being added to the end of an array by leaving the array subscript empty. For example:

$languages = array();
$languages[] = 'Perl'; # assigned to $languages[0]
$languages[] = 'PHP'; # assigned to $languages[1]
$languages[] = 'Ruby'; assigned to $languages[2]

XKeys allows you to do something similar using the symbol :[] with arrays or other types of containers supporting the #push method. This is called “push mode”. In Ruby using XKeys, it looks like this:

require 'xkeys'
languages = [].extend XKeys::Auto
languages[:[]] = 'Perl' # languages.push 'Perl' ==> languages[0]
languages[:[]] = 'PHP' # languages.push 'PHP' ==> languages[1]
languages[:[]] = 'Ruby' # languages.push 'Ruby' ==> languages[2]