Wednesday, November 13, 2019

Cross-account sharing of a PrivateLink endpoint using Private Hosted Zones and CloudFormation


It is possible to concentrate all your PrivateLink endpoints in one account, then share them with other accounts and access them through a Transit Gateway.

This reduces the consumption of private IPs and makes everything cleaner. You also do not have to pay a hourly fee for these endpoints in each of your accounts, although you still have to consider the transit fees involved to bring the data in another VPC.

This is done using Route53 and Private Hosted Zones (PHZs). James Levine's post Integrating AWS Transit Gateway with AWS PrivateLink and Amazon Route 53 Resolver explains very clearly how you can achieve this. I'll spare the details but basically, this lets you override the DNS addresses of the endpoints within your accounts to point to your private address instead of the public one.

Go read James' article first, then come back here for implementation details.

A sample use case

The use case that made me do this initially is AWS Systems Manager. I wanted to be able to use its Session Manager feature to open interactive sessions on EC2 instances, in multiple accounts. Since I wanted to prevent routing SSM data through the internet, combined with the fact that it required many VPC endpoints, I decided to concentrate them in one account.

As documented, four VPC interface endpoints (i.e. PrivateLink endpoints) are needed for this: ssm, ssmmessages, ec2 and ec2messages. There is also a fifth endpoint for S3, but that one is a gateway endpoint and it needs to be defined in each of your accounts.

When an EC2 instance tries to communicate with the ssm endpoint, its agent looks up that endpoint's  DNS address and by default, it gets the public IP address for your region. For example:

$ nslookup

Non-authoritative answer:

But what do you do if you have defined a PrivateLink endpoint for in another account, and you wish to use it through a peering connection or a Transit Gateway? James explains how to configure a DNS hosted zone to fool everything in that VPC into using a private address. A lookup in this EC2 instance will then give this result:

$ nslookup

Non-authoritative answer:

where is the private IP address assigned to the VPC endpoint.

This works because that DNS record is, in fact, an alias record on the regional address of your endpoint. For example, is aliased to

If you went further and defined your VPC in multiple AZs, you should have multiple addresses :

$ nslookup

Non-authoritative answer:

N.B. I'm still not sure what the impact of a failure in one AZ is in this scenario as I'm well-aware that using DNS as a failover mechanism isn't great and can involve timeouts.  I hope that any unavailable IP address will be removed dynamically from the regional address... I don't have the answer to this one.

Configuring a PrivateLink endpoint with a PHZ in CloudFormation

Here are the three resources you need to put in your CF template to deploy an endpoint and a private hosted zone. Once you have these in place, it is a matter of repeating the same code for ssmmessagesec2 and ec2messages.

Security Group
The first thing you need is to define a security group to make your endpoint accessible on the port it needs (usually TCP/443):

    Type: AWS::EC2::SecurityGroup
      GroupDescription: "SG for SSM endpoint"
      GroupName: "SSMSG"
        - CidrIp:
          IpProtocol: tcp
          FromPort: 443
          ToPort: 443
        - Key: Name
          Value: "My SSM Endpoint SG"
        Fn::ImportValue: "VPC-id-outputvariable-from-another-template"

Note here that I've imported the VPC ID using an output variable that comes from another CF template. You can hardcode it or input it as a parameter if you prefer.

PrivateLink Endpoint
Then, you define the PrivateLink endpoint itself:

    Type: AWS::EC2::VPCEndpoint
      VpcEndpointType: Interface
      ServiceName: !Sub com.amazonaws.${AWS::Region}.ssm
        - !Ref ssmendpointSG
        - Fn::ImportValue:
            !Sub "AZ1-subnet-id-outputvariable-from-another-template"
        - Fn::ImportValue:
            !Sub "AZ2-subnet-id-outputvariable-from-another-template"
          !Sub "VPC-id-outputvariable-from-another-template"

Private hosted zone
As for the private zone, these two entries need to be configured (you'll need to replace the region with yours):

    Type: AWS::Route53::HostedZone
      - VPCId:
          Fn::ImportValue: "VPC-id-outputvariable-from-another-template"
        VPCRegion: "ca-central-1"

    Type: AWS::Route53::RecordSet
        HostedZoneId: !Select [ '0', !Split [ ':', !Select [ '0', !GetAtt ssmendpointVPC.DnsEntries ]]]
        DNSName: !Select [ '1', !Split [ ':', !Select [ '0', !GetAtt ssmendpointVPC.DnsEntries ]]]
      HostedZoneId: !Ref phzssmcacentral1amazonawscom
      Type: A

Note the multiple selectors under AliasTarget. The combination of these selectors extracts specific fields from AWS::EC2::VPCEndpoint that are made available as attributes once CloudFormation deploys the endpoint, namely:

  • The hosted zone ID for the endpoint
  • The regional DNS address of the endpoint (as opposed to the one pointing to the AZ itself)

Sharing a PHZ across accounts

Once the PHZ is deployed, you need it to share it with your accounts. Unfortunately, you cannot do this with CloudFormation. The procedure is explained in the KB article How do I associate a Route 53 private hosted zone with a VPC on a different AWS account?. It explains how to do this using the CLI.

Good luck.

Thursday, October 24, 2019

Deploying a cross-account Transit Gateway using CloudFormation


I've decided to automate the deployment of a Transit Gateway using CloudFormation.

I'll show you here how I did it, but be advised that it is currently not possible to do complex configurations on a TGW using CloudFormation. You will need to do some tasks manually, at least one which can only be done with the AWS CLI.

First, some caveats

Now there are a few caveats you need to be aware of before using CloudFormation to deploy a Transit Gateway:

  • At this time, there are no attributes whatsoever that can be extracted from GetAtt, which means you can't extract its ARN, default route table ID, and others and use them later in your template. 
  • Every (and I mean every) property change requires a replacement, which is a big deal.
    • This means that doing something as simple as trying to change a tag on your Transit Gateway using CloudFormation will cause downtime, as it requires that you first remove any attachments and dependencies before being able to update the stack.
    • It also means that the ID and ARN of the TGW itself will change once it is replaced, which requires lots of planning: any dependents that refer to these identifiers will need to be reconfigured.

tgw-main.yml : Deploying the Transit Gateway

Deploying a TGW is fairly straightforward:

    Type: 'AWS::EC2::TransitGateway'
      AutoAcceptSharedAttachments: enable
      - Key: Name
        Value: "My Transit Gateway"

    Description: TGW ID
    Value: !Ref mytgw
      Name: "mytgw-id"

I've set AutoAcceptSharedAttachments to enable to prevent having to accept VPC attachments manually, as they will be done later.

I've also added an output variable. It is set so that I can then reference the TGW ID from other stacks, namely the VPC-related stacks that will attach themselves to the TGW. I suggest you export it with the name !Sub "${AWS::StackName}-mytgw-id" if you prefer prefixing it with the stack name.

Caution: Whatever you do, be sure to understand all the properties of AWS::EC2::TransitGateway and their implications. As I said earlier, you cannot change any of them once it's deployed without replacing the TGW, and removing all the dependencies below (and possibly more).

tgw-ram.yml: Sharing the TGW across different accounts (optional)

If you need to attach to the TGW from a VPC in another account, you first need to use Resource Access Manager (RAM) to share it between your accounts.

This cannot be done in the previous stack (tgw-main.yml); sharing the TGW requires getting its ARN and as explained in the Caveats section, there is no way do to that from CloudFormation. To my knowledge, it's not available in the portal either. Therefore, you first need to extract the ARN using the AWS CLI:

$ aws ec2 describe-transit-gateways

This will show you the ARN, such as:


  • xx-xxxx-x: The AWS region where the TGW is located
  • yyyyyyyy: The account number that hosts the TGW
  • tgw-zzzzz: The TGW ID.
Then, you can build your RAM template like this:

    Type: "AWS::RAM::ResourceShare"
      Name: "My TGW RAM Share"
      ResourceArns: arn:aws:ec2:my-aws-region:my-aws-account:transit-gateway/tgw-my-tgw-id
        - "first_account_number"
- "second_account_number"
        - Key: "Name"
          Value: "My TGW RAM Share"

N.B. I actually use a parameter for ResourceArns, so I don't have to hardcode the ARN in there. I've left it out to keep things simple.

Once this template is run, you need to go manually inside each account and accept the Resource Access Manager invite. There is no way, to my knowledge, of doing this within CloudFormation.

tgw-vpc-attach.yml: Attaching a VPC to the TGW

Assuming you already have a CloudFormation Template to deploy your VPCs, it is then a matter of adding this code to have them attach to the TGW:

    Type: 'AWS::EC2::TransitGatewayAttachment'
          !Sub "mytgw-id"
      VpcId: !Ref myvpc
        - !Ref mysubnetAZ1
        - !Ref mysubnetAZ2
      - Key: Name
        Value: "VPC TGW attachment"

    Description: VPC TGW Attachment ID
    Value: !Ref vpctgwattach
      Name: "vpctgwattach-id"

See here that I refer to the output variable defined previously in tgw-main.yml in order to get the ID of the TGW (without the stack name, but this is up to you).

This is for a VPC located in the same account as the TGW; note that referencing CloudFormation output variables doesn't work across accounts, the TGW ID can then be hardcoded. There are workarounds, but from what I've seen, they involve Lambda functions and I prefer avoiding this for the moment.

The TGW needs to be attached to a subnet in each of the AZs that your VPC spans to. It doesn't matter which subnet you pick these AZs, but you need one. The attachment creates a "secret" endpoint that consumes an IP address in each subnet and all packets that go to the TGW will be routed through it.

While it could be possible to attach to that VPC directly from tgw-main.yml, I've decided not to do this, as I prefer not having to modify the main TGW template when adding new VPCs. It must also be done from within the account that owns the VPC, so I prefer keeping the attachment business out of the main template.

tgw-defaultroutetable.yml: Adding entries to the default route table

There is no way to extract the ID of the default route table from CloudFormation, so you first need to extract it using the CLI or the Portal. The value is labeled as tgw-rtb-xxxxxx where xxxxxx is the Transit Gateway ID.

Then, adding a new route is a matter of invoking AWS::EC2::TransitGatewayRoute while referring to the route table ID. I suggest you use a parameter for the route table ID, to your leisure.

    Type: AWS::EC2::TransitGatewayRoute
          !Sub "mytgw-id"
      TransitGatewayRouteTableId: "tgw-rtb-xxxxxxxxxxxxx"

Notice here that I've used the export variable mytgw-id to identify my transit gateway.

Wrapping it all up

Deploying a TGW using CloudFormation and sharing it across accounts is a multi-step process:

  • Deploy the TGW using tgw-main.yml
  • Get the ARN manually using the AWS CLI (or some other way), then share it with with other accounts using tgw-share.yml
  • Go into each account and accept the share invitation.
  • Create a template to attach the VPCs named tgw-vpc-attach.yml or better, add the code of that template in your current VPC template(s).
  • Get the default route table ID using the portal (or some other way) and add route entries to the TGW using yet another template named tgw-routetable.yml

That's about it.

Wednesday, July 31, 2019

Update and thoughts on Ansible for cloud automation

Except for a few posts here and there, there hasn't been much really useful content in this blog in almost eight years! I think an update is in order.

I started this blog initially to target mostly HP-UX as I was feeling comfortable enough to post on various subjects on this operating system, and few, if anybody, blogged on HP-UX outside of the official channels, making this niche blog relevant.

Then I moved on in 2010. Since then, HP-UX itself as a platform has moved on itself, with fewer and fewer systems running.  And in the years that followed, I'll be the first to admit that it has not been easy to find a subject on which I felt good enough to blog about.

This is partly because I could not get a foothold on any particular technology. I've briefly worked as a systems architect, then came back to the technical side in 2014 by keeping Tru64 systems up and running until they got decommissioned (this was in an environment with extremely strict compliance rules -- to be honest, it wasn't very exciting). I then assisted in deploying some Windows servers (!!) in 2015-2016, along with some Red Hat Linux systems, and finally, in 2017, I've got drafted to help upgrading some Solaris 11.3 servers on a few SuperClusters. Okay, drafted is a strong word, it's a terrific and exciting platform, but sorry Solaris, seems to me that you're slowly moving on like HP-UX, too.

For a year now, I've been working on automating deployments in Azure in a new team. This is a 180 degree turn for a systems administrator, and I like it.

We're using Ansible to do this, using it to call (somewhat in preferred order):

  • native Ansible modules (when exist, and also when they don't crash)
  • REST API calls using azure_rm_resource whenever possible
  • ARM templates 
  • Powershell (last resort on a Linux host)

Is Ansible great at this job? It's been one year now, and I'm still not sure.

For starters, it takes a long time to make the code fully bullet-proof and idempotent. Furthermore, while Ansible (especially the modules) makes it easy to expect a desired state for specific Azure resources, it is harder to make a playbook that will take care of not only deploying resources, but reporting differences over time (i.e. drift management) and deleting these resources in the future when they will no longer be needed.

Terraform has been sugested many times to resolve this, but I haven't looked into it yet. Well, actually I did, but after an hour I still couldn't find out how to print "hello world" so I kind of called it quits, there is so much work to be done that side projects are kind of limited right now.

AWS seems to have got it right with Cloud Formation and stacks, a feature which, I think, is missing from ARM templates for now as ARM templates seem to be designed to be a one-time thing. I've just learned about stacks today and I'm getting excited.

To be continued!

Monday, March 5, 2018

Installing Solaris 11.4 beta on a Proliant G4

I've been trying to install Solaris 11.4 beta on an extremely old x86 server, in part because I do not have access to a scratch VMware environment and also to see if I could pull it off.

I had access to a bunch of unused HP Proliant DL360 G4s. They are listed as reported to work on the Hardware Compatibility List, so I said to myself "Why not". So I scavenged memory and CPUs and tried to install the OS.

I was able to boot the install media using a USB key, but the graphic card didn't seem to be compatible, as I got the message "Compatible fb not found". Specifying -B console=force-text didn't work, it switched to graphical mode anyway.

It took multiple tries and reboots to find a combination that worked. I found out that it is possible to install on a serial console. There are GRUB menu entries that let you boot the OS using ttya or ttyb, but they are hidden. I'm not sure how I got into this menu, but I think it was by pressing ESC at the GRUB prompt that gives you 5 seconds before booting the OS.

I attached a laptop with a serial cable to the server and ran screen in an xterm. I've been able to access the text installer sucessfully and install the OS.

My system now boots. I'm waiting for my network patch request to come through before continuing.

I'm especially interested in trying the new Solaris Analytics interface. I'll keep you posted.

Thursday, May 11, 2017

Revisiting the restricted shell

I've been administering Unix boxes since the mid-90s and I've always been told that using restricted shells (rsh, rksh, rbash) was a bad idea because they are easily hackable. Indeed, there are countless known methods to get out of a restriced shell: from finding an application that allows a shell escape, to trying to compile your own, to doing clever hacks with the history file.

I've recently been in a corner case where I was dealing with an embedded product which requires a specific set of commands and also uses some bracket commands that are difficult to wrap with our usual SSH command authenticator. So I decided to revisit using a restricted shell to jail this user and I think I managed to make the jail shatterproof enough.

Here is how I did it:

Create Bob's home directory, but assign it to root:
# mkdir /home/bob
# chown root:root /home/bob
# chmod 755 /home/bob

Force a .bashrc and .profile that changes Bob's PATH to a limited set of commands:
# echo "export PATH=/opt/arcbck/allowed_commands" > .bashrc
# ln -s .bashrc .profile

The reason for having both a .profile and a .bashrc is to ensure that this profile will be loaded both for interactive and non-interactive sessions.

If the user needs to write stuff somewhere, create a directory for Bob, e.g.
# mkdir /home/bob/writable
# chown bob home/bob/writable
# chmod 755 /home/bob/writable

Create the allowed_commands directory and put symlinks in it pointing to allowed binaries:
# mkdir /home/bob/allowed_commands
# ln -s /bin/mycmd allowed_commands/mycmd

Now you must be sure of the following:

1. Bob must NOT have any writable access to /home/bob/.profile or /home/bob/.bashrc, else he can change the PATH value
2. Bob must NOT have any writable access to /home/bob, to prevent any modification of .profile and .bashrc
3. Investigate ANY command that ends up in the allowed_commands jail to be sure that there is NO known way of executing another command from it, showing files or escaping the shell. If there are any, then forfeit giving this command or write a wrapper around it (see below).
4. See the jail escape methods linked above, log in as Bob and see if you can use them to escape the jail.

Example of a wrapper script with scp

Let's say I want to allow Bob to scp files into his account using scp's undocumented -t (i.e. -to) option. I would normally do this:
# ln -s /bin/scp allowed_commands/scp

This is wrong as scp can be coerced with -S to execute random commands.

A solution is to put the following in the allowed_commands jail instead:
lrwxrwxrwx. 1 root root   14 May  5 10:02 scp ->
-rwxr-xr-x. 1 root root  382 May  5 13:54

With containing this:
if [[ "$1" = "-t" && "$2" != "-"* ]]
        /bin/scp -t $2
        echo "scp_wrapper: Refused SCP command: '$*'"
exit ${returncode}

Using this wrapper, scp will only allow -t and no other option.

Good luck.

Thursday, April 6, 2017

Applications crash on SLES12 due to lock elision

This issue has been discussed in other places, but mostly related to specific applications and I think it needs its own post here for those who would stumble on this following a Google search.

glibc 2.18, released in 2013, came with a new feature named TSX Lock Elision.

Briefly, this feature changes the behaviour of in the way it handles mutexes on some specific processors that support hardware lock elision. Intel Xeon CPUs, in particular, support TSX since around 2013 or so. Lock elision offers significant performance gains for some software such as databases.

You can see if your Linux server's CPU supports lock elision by checking /proc/cpuinfo. If it mentions "hle" (hardware lock elision), it does.

RHEL 7 does not support this as of now. It comes with glibc 2.17, so lock elision is not enabled on these systems. As for SLES12, it comes with glibc 2.19, which means that SLES12 systems will use lock elision if the CPU supports it.

However, if an application unlocks a mutex twice, this can cause problems if lock elision is enabled. This is explained in detail in an LWN article. Let me quote an important paragraph in this article:

pthread_mutex_unlock() detects whether the current lock is executed transactionally by checking if the lock is free. If it is free it commits the transaction, otherwise the lock is unlocked normally. This implies that if a broken program unlocks a free lock, it may attempt to commit outside a transaction, an error which causes a fault in RTM. In POSIX, unlocking a free lock is undefined (so any behavior, including starting World War 3 is acceptable). It is possible to detect this situation by adding an additional check in the unlock path. The current glibc implementation does not do this, but if this programming mistake is common, the implementation may add this check in the future.

The "programming mistake" here is double-unlocking mutexes. I've made a sample C program that does exactly this, and although it works fine with glibc 2.17, it will crash on glibc 2.19 with a segmentation fault in __lll_unlock_elision(), if, and only if, the server's cpuinfo reports "hle".

I've stumbled upon a few applications, which I will not name here, that crash on SLES12. Upon analyzing their cores, I found that they have this same exact problem with __lll_unlock_elision(). So, one can assume that they might double-unlock some mutexes.

The bottom line is that if you have an app that does this, your best bet is to contact the vendor, and ask them to remove double mutex unlocks in their code, if they have any.

If that is not possible, there are two workarounds:

1. The first is to patch /etc/ to override libpthread 2.19 with a version that is compiled with lock elision disabled. This is documented in Novell's KB here.

2. The second (and preferred) solution is to adjust LD_LIBRARY_PATH to override it on a per-application basis. You could therefore change its startup script to add this:


Hope this helps.

Friday, September 2, 2016

Sending text logfiles from Windows to a syslog server, reliably

I'm in the following situation:
  1. I have a Windows application, let's name it MyApp
  2. MyApp creates important log files on my server without using the Event Log. These log files are simply textfiles (i.e. logfile.txt)
  3. For compliance purposes, I have to send these log files to a remote syslog server.
  4. The compliance auditor wants me to ensure that these log files are always sent no matter what.
It doesn't matter what the application is (as long as it creates a text file somewhere) and wether the receiving end is an Arcsight appliance, a Splunk box, or syslog-ng: This post will describe a generic way to achieve this, with the added bonus of reliability.

The two products that I used to implement this are neologger and NSSM:
  • Neologger reads a file (as a mater of fact, it tails it) and sends it to a syslog server. 
  • NSSM is a software that lets you wrap any application (in our case, Neologger) in a Windows service. 
What is "tailing" a file?
Unix administrators are familiar with the tail command: it follows a text file, grabbing new entries at the end as they come in. Neologger, basically replicates what "tail file.log | logger" would do on Unix.

Using Neologger

Neologger is, in essence, a simple and reliable tool. It will tail a text file endlessly, and it automatically detects if that file is deleted, shrunk or rotated, which ensures a reliable operation. To use it, simply try:

# neolog.exe -r logfile.txt -tail -t syslog_server -d

This will tail file logfile.txt and send it to syslog_server. Many other command-line options are available. Note the -d option; this is a debug option that lets you see what it does, you normally would not want it there.

The first thing you need to do is therefore to craft a command-line as above, but specific for your application. Here is a more complete example:

# "C:\Program Files\neolog\neolog.exe" -r "C:\ProgramData\My App\logfile.txt" -tail -t -p 1234 -d

This will tail logfile.txt and send it to on port 1234. Once it works for you, remove the -d option.

Wrapping Neologger with NSSM

Now, the next question is, how do I ensure that neolog.exe runs reliably? The answer is to configure Neologger as a service under Windows. It's easier to manage as a service and the operating system will ensure that it restarts appropriately if it ever crashes. That's where NSSM comes into play. NSSM (Non-Sucking Service Manager) is a tool that lets you wrap almost any application as a service.

To create a service to wrap Neologger, run NSSM like this:

# nssm install MyApp-Syslog

This will create a new service named MyApp-Syslog. Then, fill the Path, Startup directory, and Arguments as appropriate (don't forget to remove -d as it is not required here). Here is an example:

You don't need to change anything in the other tabs, but you can take a look in case you need to fine-tune something.

Now you can try starting the MyApp-Syslog via the service panel, and see if it works.
What happens if the log file isn't there in the first place? While neologger will "wait" if the file disappears once it starts tailing it, it will gracefully exit if it's not initially there. NSSM will then try to restart neolog.exe using its throttling settings. This ensures that the service will loop neolog.exe, slowly, until the file appears again. During that time, the service is labeled as "Paused" in the service panel.

Going a step further with dependencies

The last step, which can be important for compliance reasons, is not only to help Neologger run reliably (which is done by configuring it as a service), but ensure that it always runs when your application runs, too. This is done with dependencies.

If your application doesn't run as a service, you're out of luck. But let's say MyApp runs under a Windows Service named MyApp-Service. It then becomes trivial to make MyApp-Service depend on MyApp-Syslog. 

To change dependencies, you have to edit MyApp-Service directly. First, query MyApp-Service to see if it has other dependencies:

# sc qc MyApp-Service

[SC] QueryServiceConfig SUCCESS

        TYPE               : 10  WIN32_OWN_PROCESS
        START_TYPE         : 2   AUTO_START
        ERROR_CONTROL      : 1   NORMAL
        BINARY_PATH_NAME   : "C:\Program Files\MyApp\MyApp.exe"
        LOAD_ORDER_GROUP   :
        TAG                : 0
        DISPLAY_NAME       : MyApp Service
        DEPENDENCIES       : tcpip

You can see here that MyApp-Service depends on tcpip. It is important to keep this in mind. Next, change the dependencies on MyApp-Service by configuring it to depend on both tcpip and MyApp-Syslog. Note here that you have to explicitly state that tcpip is still a dependency, and separate it with a slash to add MyApp-Syslog.

# sc config MyApp-Service depend= tcpip/MyApp-Syslog
[SC] ChangeServiceConfig SUCCESS

sc qc MyApp-Service

[SC] QueryServiceConfig SUCCESS

        TYPE               : 10  WIN32_OWN_PROCESS
        START_TYPE         : 2   AUTO_START
        ERROR_CONTROL      : 1   NORMAL
        BINARY_PATH_NAME   : "C:\Program Files\MyApp\MyApp.exe"
        LOAD_ORDER_GROUP   :
        TAG                : 0
        DISPLAY_NAME       : MyApp Service
        DEPENDENCIES       : tcpip

Once  this is done, start MyApp-Service. You'll notice that it starts MyApp-Syslog automatically. The same logic applies if you stop MyApp-Syslog before MyApp-Service, both will stop at the same time.

Putting it all together

To conclude, let's restate what we just did. First, we used Neologger to tail a text file on Windows, generated by an application named MyApp and sent it, live, to a syslog server. Then, we used NSSM to configure Neologger as a Windows service to help us manage its startup and shutdown. Finally, we created a dependency between the service that runs MyApp and the new service we've just created, to reassure our compliance auditor that Neologger always runs when MyApp runs, too.

Good luck.